Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-62402

Ignore timeouts when running `ServiceEntryPointImpl::shutdown` under sanitizers

    • Type: Icon: Improvement Improvement
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Internal Code
    • None
    • Service Arch 2022-1-24, Service Arch 2022-2-07, Service Arch 2022-2-21
    • 0
    • 2

      We currently run ServiceEntryPointImpl::shutdownAndWait only if address or thread sanitizer is enabled:

      bool ServiceEntryPointImpl::shutdown(Milliseconds timeout) {
      #if __has_feature(address_sanitizer) || __has_feature(thread_sanitizer)
          // When running under address sanitizer, we get false positive leaks due to disorder around
          // the lifecycle of a connection and request. When we are running under ASAN, we try a lot
          // harder to dry up the server from active connections before going on to really shut down.
          return shutdownAndWait(timeout);
      #else
          return true;
      #endif
      }
      

      The invocation is provided with 10 seconds as the timeout:

      // Shutdown the Service Entry Point and its sessions and give it a grace period to complete.
      >if (auto sep = serviceContext->getServiceEntryPoint()) {
          LOGV2_OPTIONS(4784923, {LogComponent::kCommand}, "Shutting down the ServiceEntryPoint");
          if (!sep->shutdown(Seconds(10))) {
              LOGV2_OPTIONS(20563, {LogComponent::kNetwork}, "Service entry point did not shutdown within the time limit");
          }
      }
      

      When running the sanitizers, a timeout would cause the process to terminate prematurely and report leaks that are not real (i.e., false alarms). The recommendation is to use a very large timeout (e.g., Seconds::max()) and make sure the hang-analyzer runs if a thread is taking a very long time to join.

      Since we only run this code when sanitizers are enabled, this change will not impact production behavior.

      AC: Change the timeout to 30 seconds, and invariant or LOGV2_FATAL if shutdown isn't achieved within that timeout.

            Assignee:
            daniel.morilha@mongodb.com Daniel Morilha (Inactive)
            Reporter:
            amirsaman.memaripour@mongodb.com Amirsaman Memaripour
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: