-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Fully Compatible
-
ALL
444bb62c24 / Windows 64-bit 2008R2+ / jsCore_small_oplog
https://mci.10gen.com/ui/task_log_raw/mongodb_mongo_v2.6_windows_64_2k8_444bb62c24786a65334f4be214d1154dde00c71b_14_03_24_14_29_20_jsCore_small_oplog_windows_64_test?type=T
http://buildlogs.mongodb.org/mci_0.9_windows_64_2k8/builds/42921/test/jsCore_small_oplog_0/dropdb_race.js
In the small_oplog suite, this test issues the closeAllDatabases command on the master while the slave's getmore into oplog.rs is yielding. When the getmore returns from the yield, the dbtemprelease() destructor acquires a Context on the local database, which attempts to re-open the local database (which had been closed during the yield); this trips an massert, since opening a database requires a write lock. The slave gets back an error from the getmore call, and halts replication on the slave entirely (the "autoresync" server option is not set, so replication is never re-attempted). This prevents any further tests in the suite from running, as small_oplog waits for the slave to catch up to the master inbetween tests.
Assuming the above analysis is correct, possible remedies include any of:
- make closeAllDatabases not close "local"
- make dbtemprelease detect this failure
- turn on autoresync
- remove this test from the small_oplog suite
Test log excerpt:
2014-03-24T16:43:11.561+0000 [conn987] build index on: jstests_dur_droprace.foo properties: { v: 1, key: { x: 1.0 }, name: "x_1", ns: "jstests_dur_droprace.foo" } 2014-03-24T16:43:11.561+0000 [conn987] building index using bulk method 2014-03-24T16:43:11.561+0000 [conn987] build index done. scanned 3 total records. 0 secs 2014-03-24T16:43:11.562+0000 [conn987] DatabaseHolder::closeAll path:/data/db/sconsTests 2014-03-24T16:43:11.793+0000 [conn987] command admin.$cmd command: closeAllDatabases { closeAllDatabases: 1.0 } keyUpdates:0 numYields:0 locks(micros) W:231463 reslen:436 231ms 2014-03-24T16:43:11.793+0000 [conn4] opening db: local 2014-03-24T16:43:11.793+0000 [conn4] Assertion: 15927:can't open database in a read lock. if db was just closed, consider retrying the query. might otherwise indicate an internal error 2014-03-24T16:43:12.690+0000 [conn4] mongod.exe ...\src\mongo\util\stacktrace.cpp(169) mongo::printStackTrace+0x43 2014-03-24T16:43:12.690+0000 [conn4] mongod.exe ...\src\mongo\util\log.cpp(158) mongo::logContext+0x9c 2014-03-24T16:43:12.690+0000 [conn4] mongod.exe ...\src\mongo\util\assert_util.cpp(183) mongo::msgasserted+0xfb 2014-03-24T16:43:12.690+0000 [conn4] mongod.exe ...\src\mongo\db\catalog\database_holder.cpp(65) mongo::DatabaseHolder::getOrCreate+0x34e 2014-03-24T16:43:12.690+0000 [conn4] mongod.exe ...\src\mongo\db\client.cpp(357) mongo::Client::Context::_finishInit+0x85 2014-03-24T16:43:12.690+0000 [conn4] mongod.exe ...\src\mongo\db\db.h(70) mongo::dbtemprelease::~dbtemprelease+0x49 2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\mongo\db\clientcursor.cpp(225) mongo::ClientCursor::staticYield+0x29b 2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\mongo\db\query\plan_executor.cpp(83) mongo::PlanExecutor::getNext+0xf7 2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\mongo\db\query\new_find.cpp(204) mongo::newGetMore+0x413 2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\mongo\db\instance.cpp(730) mongo::receivedGetMore+0x67b 2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\mongo\db\instance.cpp(436) mongo::assembleResponse+0x32c 2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\mongo\db\db.cpp(204) mongo::MyMessageHandler::process+0x111 2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\mongo\util\net\message_server_port.cpp(210) mongo::PortMessageServer::handleIncomingMsg+0x677 2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\third_party\boost\libs\thread\src\win32\thread.cpp(185) boost::`anonymous namespace'::thread_start_function+0x21 2014-03-24T16:43:12.691+0000 [conn4] mongod.exe f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(314) _callthreadstartex+0x17 2014-03-24T16:43:12.691+0000 [conn4] mongod.exe f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(292) _threadstartex+0x7f 2014-03-24T16:43:12.691+0000 [conn4] kernel32.dll BaseThreadInitThunk+0xd 2014-03-24T16:43:12.691+0000 [conn4] local.oplog.$main 2014-03-24T16:43:12.692+0000 [conn987] note: not profiling because db went away - probably a close on: testing 2014-03-24T16:43:12.692+0000 [conn4] getmore local.oplog.$main cursorid:18746693299 ntoreturn:0 keyUpdates:0 exception: can't open database in a read lock. if db was just closed, consider retrying the query. might otherwise indicate an internal error code:15927 numYields:5 locks(micros) r:899043 nreturned:1 reslen:176 10025ms 2014-03-24T16:43:12.698+0000 [conn987] dropDatabase jstests_dur_droprace starting 2014-03-24T16:43:12.745+0000 [conn987] removeJournalFiles 2014-03-24T16:43:12.764+0000 [conn987] dropDatabase jstests_dur_droprace finished 2014-03-24T16:43:12.766+0000 [FileAllocator] allocating new datafile /data/db/sconsTests\jstests_dur_droprace.ns, filling with zeroes... 2014-03-24T16:43:13.880+0000 [FileAllocator] done allocating datafile /data/db/sconsTests\jstests_dur_droprace.ns, size: 16MB, took 1.114 secs 2014-03-24T16:43:13.881+0000 [FileAllocator] allocating new datafile /data/db/sconsTests\jstests_dur_droprace.0, filling with zeroes... 2014-03-24T16:43:19.602+0000 [FileAllocator] done allocating datafile /data/db/sconsTests\jstests_dur_droprace.0, size: 64MB, took 5.719 secs 2014-03-24T16:43:19.603+0000 [conn987] build index on: jstests_dur_droprace.foo properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "jstests_dur_droprace.foo" } 2014-03-24T16:43:19.603+0000 [conn987] added index to empty collection 2014-03-24T16:43:19.604+0000 [conn987] insert jstests_dur_droprace.foo query: { _id: ObjectId('533060a05e3a4439beec0f21') } ninserted:1 keyUpdates:0 numYields:0 locks(micros) w:6837265 6837ms 2014-03-24T16:43:19.604+0000 [conn987] command jstests_dur_droprace.$cmd command: insert { insert: "foo", documents: [ { _id: ObjectId('533060a05e3a4439beec0f21') } ], ordered: true } keyUpdates:0 numYields:0 locks(micros) w:55 reslen:56 6837ms 2014-03-24T16:43:19.611+0000 [conn987] build index on: jstests_dur_droprace.foo properties: { v: 1, key: { x: 1.0 }, name: "x_1", ns: "jstests_dur_droprace.foo" } 2014-03-24T16:43:19.611+0000 [conn987] building index using bulk method 2014-03-24T16:43:19.611+0000 [conn987] build index done. scanned 3 total records. 0 secs 2014-03-24T16:43:19.614+0000 [conn987] dropDatabase jstests_dur_droprace starting 2014-03-24T16:43:19.666+0000 [conn987] removeJournalFiles 2014-03-24T16:43:19.682+0000 [conn987] dropDatabase jstests_dur_droprace finished 2014-03-24T16:43:19.684+0000 [FileAllocator] allocating new datafile /data/db/sconsTests\jstests_dur_droprace.ns, filling with zeroes... 2014-03-24T16:43:12.692+0000 [replslave] op: { $err: "can't open database in a read lock. if db was just closed, consider retrying the query. might otherwise indicate an internal error", code: 15927 } 2014-03-24T16:43:12.692+0000 [replslave] halting replication 2014-03-24T16:43:12.692+0000 [replslave] caught SyncException 2014-03-24T16:43:12.692+0000 [replslave] repl: sleep 10 sec before next pass 2014-03-24T16:43:22.319+0000 [FileAllocator] done allocating datafile /data/db/sconsTests\jstests_dur_droprace.ns, size: 16MB, took 2.633 secs 2014-03-24T16:43:22.693+0000 [replslave] all sources dead: sync error: no ts found querying remote oplog record, sleeping for 5 seconds 2014-03-24T16:43:27.694+0000 [replslave] all sources dead: sync error: no ts found querying remote oplog record, sleeping for 5 seconds
Suite log excerpt:
[2014/03/24 16:43:02.400] ******************************************* [2014/03/24 16:43:02.404] Test : drop_index.js ... [2014/03/24 16:43:02.404] Command : ./mongo.exe --port 27999 --authenticationMechanism MONGODB-CR --writeMode commands --eval testingReplication = true; D:\data\mci\git@github.commongodb\mongo.git\mongodb-mongo-v2.6\jstests\core\drop_index.js [2014/03/24 16:43:02.404] Date : Mon Mar 24 16:43:02 2014 [2014/03/24 16:43:02.502] (output suppressed; see http://buildlogs.mongodb.org/build/53305ef0d2a60f71ad0009ef/test/53306096d2a60f732a0002da/) [2014/03/24 16:43:02.620] 219.9998 ms [2014/03/24 16:43:02.704] [2014/03/24 16:43:02.704] waiting for slave (28000) to catch up to master (27999) [2014/03/24 16:43:02.704] caught up! [2014/03/24 16:43:02.704] replication ok for 9750 collections [2014/03/24 16:43:02.704] skipping D:\data\mci\git@github.commongodb\mongo.git\mongodb-mongo-v2.6\jstests\core\dropdb.js [2014/03/24 16:43:02.708] waiting for slave (28000) to catch up to master (27999) [2014/03/24 16:43:02.708] caught up! [2014/03/24 16:43:02.708] replication ok for 9878 collections [2014/03/24 16:43:02.708] ******************************************* [2014/03/24 16:43:02.708] Test : dropdb_race.js ... [2014/03/24 16:43:02.708] Command : ./mongo.exe --port 27999 --authenticationMechanism MONGODB-CR --writeMode commands --eval testingReplication = true; D:\data\mci\git@github.commongodb\mongo.git\mongodb-mongo-v2.6\jstests\core\dropdb_race.js [2014/03/24 16:43:02.708] Date : Mon Mar 24 16:43:02 2014 [2014/03/24 16:43:02.817] (output suppressed; see http://buildlogs.mongodb.org/build/53305ef0d2a60f71ad0009ef/test/53306096d2a60f737900008e/) [2014/03/24 16:44:05.459] 1.0459 minutes [2014/03/24 16:49:10.494] timed out [2014/03/24 16:49:10.494] timed out [2014/03/24 16:49:10.494] [2014/03/24 16:49:16.531] 154 tests succeeded [2014/03/24 16:49:16.531] 591 tests didn't get run [2014/03/24 16:49:16.534] replication ok for 9878 collections [2014/03/24 16:49:16.624] timed out [2014/03/24 16:49:16.624] Traceback (most recent call last): [2014/03/24 16:49:16.624] File "buildscripts/smoke.py", line 1267, in <module> [2014/03/24 16:49:16.624] main() [2014/03/24 16:49:16.624] File "buildscripts/smoke.py", line 1251, in main [2014/03/24 16:49:16.624] run_tests(tests) [2014/03/24 16:49:16.624] File "buildscripts/smoke.py", line 696, in run_tests [2014/03/24 16:49:16.624] master.wait_for_repl() [2014/03/24 16:49:16.624] File "buildscripts/smoke.py", line 306, in wait_for_repl [2014/03/24 16:49:16.627] Connection(port=self.port).testing.smokeWait.insert({}, w=2, wtimeout=5*60*1000) [2014/03/24 16:49:16.627] File "C:\Python27\lib\site-packages\pymongo\collection.py", line 362, in insert [2014/03/24 16:49:16.627] self.database.connection) [2014/03/24 16:49:16.627] File "C:\Python27\lib\site-packages\pymongo\message.py", line 248, in _do_batched_insert [2014/03/24 16:49:16.627] client._send_message(_insert_message(EMPTY.join(data), safe), safe) [2014/03/24 16:49:16.627] File "C:\Python27\lib\site-packages\pymongo\mongo_client.py", line 969, in _send_message [2014/03/24 16:49:16.627] rv = self.__check_response_to_last_error(response) [2014/03/24 16:49:16.627] File "C:\Python27\lib\site-packages\pymongo\mongo_client.py", line 911, in __check_response_to_last_error [2014/03/24 16:49:16.627] raise OperationFailure(details["err"], details["code"]) [2014/03/24 16:49:16.627] pymongo.errors.OperationFailure: timeout [2014/03/24 16:49:16.656] Running script task for command ${killall_mci|pkill mongo; pkill -9 mongod; pkill -9 mongos; pkill -f buildlogger.py; pkill -f smoke.py} in directory /data/mci/git@github.commongodb/mongo.git/mongodb-mongo-v2.6 [2014/03/24 16:49:16.656] Full command: taskkill /IM mongod.exe /IM mongos.exe /IM mongo.exe /IM test.exe /IM buildlogger.py /IM smoke.py /IM python.exe /IM cl.exe /f [2014/03/24 16:49:16.658] Command successfully started and appended to running commands
- is related to
-
SERVER-13351 Slaves halt replication if oplog getmore returns $err
- Closed