Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 4.4.0
Affects Version/s: 3.1.6
Component/s: MapReduce, Sharding
Labels:
- 32qa

Assigned Teams:

Sharding
Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

It's possible for concurrent, sharded mapReduces to fail with DEAD plan executors when there's a collision in temporary namespaces across multiple mongos processes.

This bug is intermittently triggered by the concurrency suite.

This seems to be the sequence of events:

1 - A mongos process issues a drop command, on all shards, on a tmp.mrs namespace after finishing the mapReduce.shardedfinish command (in cluster_map_reduce_cmd.cpp).

2 - At the same time, another mongos process tries to initialize a ParallelSortClusteredCursor on the very same tmp.mrs namespace as part of another mapReduce.shardedfinish command.

3 - The drop invalidates cursors on the tmp.mrs namespace, which leads to a DEAD plan executor and a failed mapReduce command.

Relevant log lines:

I COMMAND  [conn26] CMD: drop db1.tmp.mrs.coll1_1440026655_43
E QUERY    [conn30] Plan executor error during find: DEAD, stats: { stage: "FETCH", nReturned: 0, executionTimeMillisEstimate: 0, works: 0, advanced: 0, needTime: 0, needYield: 0, saveState: 1, restoreState: 0, isEOF: 0, invalidates: 0, docsExamined: 0, alreadyHasObj: 0, inputStage: { stage: "IXSCAN", nReturned: 0, executionTimeMillisEstimate: 0, works: 0, advanced: 0, needTime: 0, needYield: 0, saveState: 1, restoreState: 0, isEOF: 0, invalidates: 0, keyPattern: { _id: 1 }, indexName: "_id_", isMultiKey: false, isUnique: true, isSparse: false, isPartial: false, indexVersion: 1, direction: "forward", indexBounds: { _id: [ "[MinKey, MaxKey]" ] }, keysExamined: 0, dupsTested: 0, dupsDropped: 0, seenInvalidated: 0 } }
I QUERY    [conn30] assertion 17144 Executor error: OperationFailed Operation aborted because: all indexes on collection dropped ns:db1.tmp.mrs.coll1_1440026655_43 query:{ query: {}, orderby: { _id: 1 } }

Test output:

Error: map reduce failed:{
  "ok" : 0,
  "errmsg" : "MR post processing failed: { ok: 0.0, errmsg: \"could not initialize cursor across all shards because : Executor error: OperationFailed Operation aborted because: all indexes on collection dropped @...\", code: 14827 }"
}

related to

SERVER-34539 Re-enable sharded mapReduce concurrency testing and only use a single mongos

Closed

Assignee:: [DO NOT USE] Backlog - Sharding Team
Reporter:: Kamran K. (Inactive)
Participants:: [DO NOT USE] Backlog - Sharding Team, Charlie Swanson, Githook User, Kamran K.
Votes:: 1 Vote for this issue
Watchers:: 10 Start watching this issue

Created:: Aug 20 2015 01:52:44 AM UTC
Updated:: Dec 06 2022 04:45:47 AM UTC
Resolved:: Mar 09 2020 08:26:44 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates