-
Type: Bug
-
Resolution: Done
-
Priority: Critical - P2
-
Affects Version/s: 3.0.1
-
Component/s: Replication
-
Minor Change
-
ALL
-
0
ISSUE SUMMARY
On a MongoDB replica set, when a secondary node is running multiple background index builds on a given collection, metadata changes to that same collection may lead to a fatal error on the secondary node.
Metadata changes that may trigger this behavior include renaming and dropping the collection, and dropping the database that contains the collection.
USER IMPACT
If a quorum of secondary nodes experience the error and shut down, the replica set will no longer have enough voting nodes operational, leading to loss of write availability.
WORKAROUNDS
Avoid collection creation, drop, and rename operations while building indexes in the background on that same collection.
AFFECTED VERSIONS
MongoDB 3.0.0 through 3.0.3.
FIX VERSION
The fix is included in the 3.0.4 production release.
Original description
Create and destroy indexes with different options, and variations, on the same collection from multiple clients and there is a chance that secondaries will fassert when applying the oplog. Thus far, no problem has been observed on the primary.
Tested using 3.0.1 enterprise. Known to occur on ubuntu 12.01 and windows 8.
Attached is the script used in each shell session. The "test.ts" collection had 250K small documents structured as {_id:ObjectId,server:int,cpu:int} however neither the structure nor quantity of documents seem to be important as other variations also trigger the fault. Background indexing appears to be a crucial requirement. The fault was originally observed on a sharded cluster with operations performed via a mongos, but a basic replica-set is all that is needed.
Sometimes the secondaries can be restarted, recover, and rejoin normally. Sometimes they fassert again on restart, persistently, until re-sync'ed. Both these results were observed in consecutive runs with no known difference to explain the different recovery result (other than timing).
Also attached is log output of an example restart (on windows) where the secondary could not recover.
- is duplicated by
-
SERVER-18762 Mongo 3.0 crashes while replicating map reduce collections
- Closed
-
SERVER-19065 dropIndexes() produces "Assertion: 17348:cannot dropAllIndexes when index builds in progress"
- Closed
- related to
-
SERVER-20010 Segfault while dropping an index that failed to build
- Closed