-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 2.5.3
-
Component/s: Concurrency, Replication, Sharding
-
ALL
[ed: original title "Interrupting applyOps can lead to out-of-sync replicas"]
The applyOps command acquires a global lock, applies the provided operations one-by-one, and (if run on a primary member) then logs the command invocation to the oplog. However, it is possible for the command to be interrupted after only some of the operations have been applied, in which case the changes already processed are persisted to disk but not reflected in the oplog, and thus not reflected in the data copies of the other members.
The command can be interrupted, for example, while processing a delete operation: applyOperation_inlock() calls deleteObjects() with god=false, which, even on a simple _id query, can force a yield if the associated index entry is determined to not be resident in physical memory.
See the following shell session. After processing the first operation in applyOps, a normal shutdown was initiated with SIGINT, followed by a restart. Note the change in the collection contents, but the absence of an oplog entry.
> db.foo.insert([{_id:1},{_id:2}]) > var remove1 = { "ts" : Timestamp(1382671580, 1), "h" : NumberLong("8276060108251241091"), "v" : 2, "op" : "d", "ns" : "test.foo", "b" : true, "o" : { "_id" : 1 } } > var remove2 = { "ts" : Timestamp(1382673262, 1), "h" : NumberLong("-5171031159600005302"), "v" : 2, "op" : "d", "ns" : "test.foo", "b" : true, "o" : { "_id" : 2 } } > db.runCommand({applyOps:[remove1,remove2]}) { "errmsg" : "exception: interrupted at shutdown", "code" : 11600, "ok" : 0 } > db.foo.find() 2013-10-24T23:56:09.545-0400 DBClientCursor::init call() failed Error: error doing query: failed > db.foo.find() 2013-10-24T23:56:10.419-0400 trying reconnect to 127.0.0.1:27017 2013-10-24T23:56:10.419-0400 reconnect 127.0.0.1:27017 ok { "_id" : 2 } > db.getSiblingDB("local").oplog.rs.count({op:"c","o.applyOps":{$exists:true}}) 0
Stack trace for above, at interruption:
(gdb) bt #0 mongo::uasserted (msgid=11600, msg=0x100bc9565 "interrupted at shutdown") at assert_util.cpp:159 #1 0x00000001002bf14a in mongo::KillCurrentOp::checkForInterrupt (this=0x100ec1180, heedMutex=false) at kill_current_op.cpp:131 #2 0x000000010011c924 in std::auto_ptr<mongo::LockMongoFilesShared>::auto_ptr () at /usr/include/c++/4.2.1/memory:379 #3 0x000000010011c924 in mongo::ClientCursor::staticYield (micros=100, ns=@0x13be1afa0, rec=0x137b940b0) at clientcursor.cpp:381 #4 0x000000010011ccf6 in mongo::ClientCursor::yield (this=0x10423a570, micros=100, recordToLoad=0x137b940b0) at clientcursor.cpp:767 #5 0x000000010011ce28 in mongo::ClientCursor::yieldSometimes (this=0x10423a540, need=mongo::ClientCursor::WillNeed, yielded=0x1) at clientcursor.cpp:781 #6 0x00000001002e1a93 in mongo::deleteObjects (ns=@0x13be1b470, pattern=@0x13be1b470, god=false, logop=false, justOne=true) at delete.cpp:106 #7 0x00000001004447da in mongo::applyOperation_inlock (op=@0x13be1c050, convertUpdateToUpsert=true, fromRepl=true) at oplog.cpp:603 #8 0x000000010012dace in mongo::ApplyOpsCmd::run (this=0x13be1c6a0, dbname=@0x13be1c6a0, cmdObj=@0x13be1c6a0, unnamed_arg=0, errmsg=@0x13be1c6a0, result=@0x13be1c6a0, fromRepl=false) at apply_ops.cpp:116 #9 0x00000001001aa635 in mongo::_execCommand (c=0x100ebfc60, dbname=@0x13be1cac8, cmdObj=@0xffffffff00000001, queryOptions=0, errmsg=@0x13be1caa8, result=@0x13be1d010, fromRepl=false) at dbcommands.cpp:1850 #10 0x00000001001ab622 in mongo::Command::execCommand (c=0x100ebfc60, client=@0x13be1cc40, queryOptions=1004653632, cmdns=0x13be1cc40 "���;\001", cmdObj=@0x13be1cc40, result=@0x13be1d010, fromRepl=false) at dbcommands.cpp:2017 #11 0x00000001001ac02c in mongo::_runCommands (ns=0x13be1ce90 " ��;\001", _cmdobj=@0x13be1ce90, b=@0x13be1ce90, anObjBuilder=@0x13be1ce90, queryOptions=0, fromRepl=false) at dbcommands.cpp:2081 #12 0x000000010030612e in mongo::runCommands (ns=0x100ebfc60 "0��", jsobj=@0x13be1d0c8, curop=@0x104398b00, b=@0x13be1d010, anObjBuilder=@0x13be1ccb8, queryOptions=0) at query.cpp:69 #13 0x0000000100306c1d in mongo::runQuery (m=@0x13be1d0c8, q=@0x13be1d5e0, curop=@0x13be1d5e0, result=@0x13be1d5e0) at query.cpp:1064 #14 0x0000000100294746 in receivedQuery [inlined] () at /Users/rassi/work/mongo/src/mongo/db/instance.cpp:280 #15 0x0000000100294746 in mongo::assembleResponse (m=@0x13be1d958, dbresponse=@0x13be1db60, remote=@0x4) at instance.cpp:443 #16 0x00000001000075d4 in mongo::MyMessageHandler::process (this=0x13be1d958, m=@0x13be1dc70, port=0x13be1dc70, le=0x13be1dc70) at db.cpp:199 #17 0x00000001005de601 in boost::shared_ptr<mongo::Socket>::operator-> () at /Users/rassi/work/mongo/src/third_party/boost/boost/smart_ptr/shared_ptr.hpp:209 #18 0x00000001005de601 in mongo::PortMessageServer::handleIncomingMsg (arg=0x13be1d958) at message_server_port.cpp:210 #19 0x0000000100645fd5 in thread_proxy (param=0x10435b9b0) at thread.cpp:121 #20 0x00007fff82126782 in _pthread_start () #21 0x00007fff821131c1 in thread_start ()
- is related to
-
SERVER-11038 Interrupting renameCollection via shutdown can cause silent data loss
- Closed
- related to
-
SERVER-11432 Audit usage of checkForInterrupt(false)
- Closed