Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-20057

Concurrent, sharded mapReduces can fail when temporary namespaces collide across mongos processes

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 4.4.0
    • Affects Version/s: 3.1.6
    • Component/s: MapReduce, Sharding
    • Sharding
    • Fully Compatible
    • ALL

      It's possible for concurrent, sharded mapReduces to fail with DEAD plan executors when there's a collision in temporary namespaces across multiple mongos processes.

      This bug is intermittently triggered by the concurrency suite.

      This seems to be the sequence of events:

      1 - A mongos process issues a drop command, on all shards, on a tmp.mrs namespace after finishing the mapReduce.shardedfinish command (in cluster_map_reduce_cmd.cpp).

      2 - At the same time, another mongos process tries to initialize a ParallelSortClusteredCursor on the very same tmp.mrs namespace as part of another mapReduce.shardedfinish command.

      3 - The drop invalidates cursors on the tmp.mrs namespace, which leads to a DEAD plan executor and a failed mapReduce command.


      Relevant log lines:

      I COMMAND  [conn26] CMD: drop db1.tmp.mrs.coll1_1440026655_43
      E QUERY    [conn30] Plan executor error during find: DEAD, stats: { stage: "FETCH", nReturned: 0, executionTimeMillisEstimate: 0, works: 0, advanced: 0, needTime: 0, needYield: 0, saveState: 1, restoreState: 0, isEOF: 0, invalidates: 0, docsExamined: 0, alreadyHasObj: 0, inputStage: { stage: "IXSCAN", nReturned: 0, executionTimeMillisEstimate: 0, works: 0, advanced: 0, needTime: 0, needYield: 0, saveState: 1, restoreState: 0, isEOF: 0, invalidates: 0, keyPattern: { _id: 1 }, indexName: "_id_", isMultiKey: false, isUnique: true, isSparse: false, isPartial: false, indexVersion: 1, direction: "forward", indexBounds: { _id: [ "[MinKey, MaxKey]" ] }, keysExamined: 0, dupsTested: 0, dupsDropped: 0, seenInvalidated: 0 } }
      I QUERY    [conn30] assertion 17144 Executor error: OperationFailed Operation aborted because: all indexes on collection dropped ns:db1.tmp.mrs.coll1_1440026655_43 query:{ query: {}, orderby: { _id: 1 } }
      

      Test output:

      Error: map reduce failed:{
        "ok" : 0,
        "errmsg" : "MR post processing failed: { ok: 0.0, errmsg: \"could not initialize cursor across all shards because : Executor error: OperationFailed Operation aborted because: all indexes on collection dropped @...\", code: 14827 }"
      }
      

            Assignee:
            backlog-server-sharding [DO NOT USE] Backlog - Sharding Team
            Reporter:
            kamran.khan Kamran K.
            Votes:
            1 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: