Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-50775

resumable index build rollback tests hang in RollbackTest.kSyncSourceOpsBeforeRollback

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.8.0
    • Affects Version/s: None
    • Component/s: Storage
    • None
    • Fully Compatible
    • ALL
    • Execution Team 2020-09-21

      The rollback tests for resumable index builds sometimes hang during the RollbackTest.transitionToSyncSourceOperationsBeforeRollback phase. The logs indicate that the replSetStepDown is failing to acquire the RSTL to complete the stepdown:

      [js_test:rollback_resumable_index_build_drain_writes_phase] 2020-09-06T11:16:14.276+0000 d21280| 2020-09-06T11:16:14.276+00:00 I  COMMAND  51803   [conn103] "Slow query","attr":{"type":"command","ns":"admin.$cmd","appName":"MongoDB Shell","command":{"replSetStepDown":1,"force":true,"lsid":{"id":{"$uuid":"b9fb9ad3-9c2d-4a97-90ae-7861250da89c"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1599390943,"i":4}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"secondaryPreferred"},"$db":"admin"},"numYields":0,"ok":0,"errMsg":"Unable to acquire X lock on '{4611686018427387905: ReplicationStateTransition, 1}' within 1000ms. opId: 2497, op: conn103, connId: 103.","errName":"LockTimeout","errCode":24,"reslen":347,"locks":{"ReplicationStateTransition":{"acquireCount":{"W":1},"acquireWaitCount":{"W":1},"timeAcquiringMicros":{"W":1001442}}},"protocol":"op_msg","durationMillis":1002}
      

      The resumable index build rollback test fixture is supposed to detect an interrupted index build and disable the fail point that is blocking the stepdown in this parallel shell script.

      [js_test:rollback_resumable_index_build_drain_writes_phase] 2020-09-06T11:15:44.360+0000 d21280| 2020-09-06T11:15:44.359+00:00 W  CONTROL  23829   [conn152] "Set failpoint","attr":{"failPointName":"hangIndexBuildDuringBulkLoadPhase","failPoint":{"mode":0,"data":{},"timesEntered":2}}
      

      However, the lock manager diagnostic output shows the index build still waiting in the (disabled) fail point:

      [js_test:rollback_resumable_index_build_drain_writes_phase] 2020-09-06T11:28:15.455+0000 d21280| 2020-09-06T11:28:15.454+00:00 I  -        20521   [initandlisten] "lock manager dump","attr":{"addr":"0x7f157b3805d0","locks":[{"lockAddr":"0x7f1587ac7f00","resourceId":"{6917529027641081857: Global, 1}","granted":[{"lockRequest":"0x877","lockRequestAddr":"0x7f1588764380","thread":"139729275131648","mode":"IX","convertMode":"NONE","enqueueAtFront":false,"compatibleFirst":false,"debugInfo":"index build: e7e6b33d-5d1e-475b-93ff-556d37705231","clientInfo":{"desc":"IndexBuildsCoordinatorMongod-0","opid":2167}}],"pending":[]},{"lockAddr":"0x7f1587ac7e80","resourceId":"{4611686018427387905: ReplicationStateTransition, 1}","granted":[{"lockRequest":"0x877","lockRequestAddr":"0x7f1588764380","thread":"139729275131648","mode":"IX","convertMode":"NONE","enqueueAtFront":false,"compatibleFirst":false,"debugInfo":"index build: e7e6b33d-5d1e-475b-93ff-556d37705231","clientInfo":{"desc":"IndexBuildsCoordinatorMongod-0","opid":2167}}],"pending":[]},{"lockAddr":"0x7f15889d0100","resourceId":"{10123292395995783581: Database, 899920359141007773, test}","granted":[{"lockRequest":"0x877","lockRequestAddr":"0x7f1588764380","thread":"139729275131648","mode":"IX","convertMode":"NONE","enqueueAtFront":false,"compatibleFirst":false,"debugInfo":"index build: e7e6b33d-5d1e-475b-93ff-556d37705231","clientInfo":{"desc":"IndexBuildsCoordinatorMongod-0","opid":2167}}],"pending":[]},{"lockAddr":"0x7f15889d0400","resourceId":"{11723802635811401853: Collection, 194587589742932093, test.rollback_resumable_index_build_drain_writes_phase}","granted":[{"lockRequest":"0x877","lockRequestAddr":"0x7f1588764380","thread":"139729275131648","mode":"IX","convertMode":"NONE","enqueueAtFront":false,"compatibleFirst":false,"debugInfo":"index build: e7e6b33d-5d1e-475b-93ff-556d37705231","clientInfo":{"desc":"IndexBuildsCoordinatorMongod-0","opid":2167}}],"pending":[]}]}
      

            Assignee:
            gregory.noma@mongodb.com Gregory Noma
            Reporter:
            benety.goh@mongodb.com Benety Goh
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: