-
Type: Bug
-
Resolution: Done
-
Priority: Critical - P2
-
Affects Version/s: 2.0.6, 2.2.0-rc0
-
Component/s: Replication
-
Environment:Any replicated environment
-
ALL
Given a replica set which has collection 'c' with a unique index on 'a', consider the following sequence of events:
db.c.update({ _id:0 }, { $set:{ a:1 } }); db.c.update({ _id:0 }, { $set:{ a:2 } }); db.c.update({ _id:1 }, { $set:{ a:1 } });
If a secondary node encounters an error during replication (e.g. by killing the getMore op or network disconnect), it may try to reapply the same operations. In the above case, attempting to reapply the operations will fail on the secondary nodes due to {_id:1} already having a unique index value on
{a:1}.
On version 2.0.2, the issue is manifested by secondarys halting replication due to syncTail encountering E11000 on the secondary node(s). On version master (~2.2.0-rc0, git hash 73b97070259a0e30d5ed35880b11b32078620e42), the following fassert() is hit:
Tue Jul 31 14:18:05 [repl writer worker 1] build index test.testUniqueReplication { _id: 1 } Tue Jul 31 14:18:05 [repl writer worker 1] build index done. scanned 0 total records. 0 secs Tue Jul 31 14:18:05 [repl writer worker 1] info: creating collection test.testUniqueReplication on add index Tue Jul 31 14:18:05 [repl writer worker 1] build index test.testUniqueReplication { a: 1.0 } Tue Jul 31 14:18:05 [repl writer worker 1] build index done. scanned 0 total records. 0 secs Tue Jul 31 14:18:23 [conn2292] end connection 127.0.0.1:56268 (11 connections now open) Tue Jul 31 14:18:23 [initandlisten] connection accepted from 127.0.0.1:56281 #2294 (12 connections now open) Tue Jul 31 14:18:27 [conn2293] end connection 127.0.0.1:56269 (11 connections now open) Tue Jul 31 14:18:27 [initandlisten] connection accepted from 127.0.0.1:56282 #2295 (12 connections now open) Tue Jul 31 14:18:53 [conn2294] end connection 127.0.0.1:56281 (11 connections now open) Tue Jul 31 14:18:53 [initandlisten] connection accepted from 127.0.0.1:56293 #2296 (12 connections now open) Tue Jul 31 14:18:57 [conn2295] end connection 127.0.0.1:56282 (11 connections now open) Tue Jul 31 14:18:57 [initandlisten] connection accepted from 127.0.0.1:56294 #2297 (12 connections now open) Tue Jul 31 14:18:58 [repl writer worker 1] CMD: drop test.testUniqueReplication Tue Jul 31 14:18:58 [repl writer worker 1] build index test.testUniqueReplication { _id: 1 } Tue Jul 31 14:18:58 [repl writer worker 1] build index done. scanned 0 total records. 0 secs Tue Jul 31 14:18:58 [repl writer worker 1] info: creating collection test.testUniqueReplication on add index Tue Jul 31 14:18:58 [repl writer worker 1] build index test.testUniqueReplication { a: 1.0 } Tue Jul 31 14:18:58 [repl writer worker 1] build index done. scanned 0 total records. 0 secs Tue Jul 31 14:19:12 [repl writer worker 1] ERROR: writer worker caught exception: E11000 duplicate key error index: test.testUniqueReplication.$a_1 dup key: { : 1.0 } on: { ts: Timestamp 1343769552000|191, h: 7269005229324720756, op: "u" , ns: "test.testUniqueReplication", o2: { _id: 0.0 }, o: { $set: { a: 1.0 } } } Tue Jul 31 14:19:12 [repl writer worker 1] Fatal Assertion 16360 0x9ccdd3 0x98211d 0x83b398 0x71e326 0x5bda6b 0x7f8c40d60e9a 0x7f8c4027e4bd ./mongod(_ZN5mongo15printStackTraceERSo+0x23) [0x9ccdd3] ./mongod(_ZN5mongo13fassertFailedEi+0x4d) [0x98211d] ./mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0xf8) [0x83b398] ./mongod(_ZN5mongo10threadpool6Worker4loopEv+0x4d6) [0x71e326] ./mongod() [0x5bda6b] /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f8c40d60e9a] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f8c4027e4bd] Tue Jul 31 14:19:12 [repl writer worker 1] ***aborting after fassert() failure
- depends on
-
SERVER-7229 test failure in replset8.js
- Closed
- is duplicated by
-
SERVER-9059 Fatal Assertion on replication
- Closed
- is related to
-
SERVER-7186 Batched oplog application increases frequency of idempotency violations
- Closed