-
Type: Bug
-
Resolution: Done
-
Priority: Critical - P2
-
Affects Version/s: 2.8.0-rc3, 2.8.0-rc4
-
Component/s: Concurrency
-
Fully Compatible
-
ALL
mongod crashed during mixed read/write traffic testing, the thread raise the exception is shard related, which does moveChunk.
This happens after about 3 days of execution, reproduced the same issue with rc3 & rc4.
here is log about the crash (from rc4)
2015-01-05T22:54:49.372+0000 F - [conn60] terminate() called. An exception is active; attempting to gather more information 2015-01-05T22:54:49.441+0000 F - [conn60] std::exception::what(): std::exception Actual exception type: mongo::DBTryLockTimeoutException 0xf133b9 0xf12eb0 0x7fb3ac8bb6c6 0x7fb3ac8ba789 0x7fb3ac8bb33a 0x7fb3ac358913 0x7fb3ac358e47 0x9954a4 0xdac52c 0xdae3f0 0x9ad054 0x9adf93 0x9aea4b 0xb7ca1a 0xa8fcd5 0x7e41f0 0xed1381 0x7fb3acf74f18 0x7fb3ac086b9d ----- BEGIN BACKTRACE ----- {"backtrace":[{"b":"400000","o":"B133B9"},{"b":"400000","o":"B12EB0"},{"b":"7FB3AC85D000","o":"5E6C6"},{"b":"7FB3AC85D000","o":"5D789"},{"b":"7FB3AC85D000","o":"5E33A"},{"b":"7FB3AC349000","o":"F913"},{"b":"7FB3AC349000","o":"FE47"},{"b":"400000","o":"5954A4"},{"b":"400000","o":"9AC52C"},{"b":"400000","o":"9AE3F0"},{"b":"400000","o":"5AD054"},{"b":"400000","o":"5ADF93"},{"b":"400000","o":"5AEA4B"},{"b":"400000","o":"77CA1A"},{"b":"400000","o":"68FCD5"},{"b":"400000","o":"3E41F0"},{"b":"400000","o":"AD1381"},{"b":"7FB3ACF6D000","o":"7F18"},{"b":"7FB3ABFA4000","o":"E2B9D"}],"processInfo":{ "mongodbVersion" : "2.8.0-rc4", "gitVersion" : "3ad571742911f04b307f0071979425511c4f2570", "uname" : { "sysname" : "Linux", "release" : "3.14.19-17.43.amzn1.x86_64", "version" : "#1 SMP Wed Sep 17 22:14:52 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFFA4AFE000", "elfType" : 3 }, { "b" : "7FB3ACF6D000", "path" : "/lib64/libpthread.so.0", "elfType" : 3 }, { "b" : "7FB3ACD65000", "path" : "/lib64/librt.so.1", "elfType" : 3 }, { "b" : "7FB3ACB61000", "path" : "/lib64/libdl.so.2", "elfType" : 3 }, { "b" : "7FB3AC85D000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3 }, { "b" : "7FB3AC55F000", "path" : "/lib64/libm.so.6", "elfType" : 3 }, { "b" : "7FB3AC349000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7FB3ABFA4000", "path" : "/lib64/libc.so.6", "elfType" : 3 }, { "b" : "7FB3AD189000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }} mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf133b9] mongod(+0xB12EB0) [0xf12eb0] libstdc++.so.6(+0x5E6C6) [0x7fb3ac8bb6c6] libstdc++.so.6(+0x5D789) [0x7fb3ac8ba789] libstdc++.so.6(__gxx_personality_v0+0x52A) [0x7fb3ac8bb33a] libgcc_s.so.1(+0xF913) [0x7fb3ac358913] libgcc_s.so.1(_Unwind_Resume+0x57) [0x7fb3ac358e47] mongod(_ZN5mongo4Lock10GlobalReadC2EPNS_6LockerEj+0x84) [0x9954a4] mongod(_ZN5mongo17MigrateFromStatus4doneEPNS_16OperationContextE+0x8C) [0xdac52c] mongod(_ZN5mongo16MoveChunkCommand3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x1CE0) [0xdae3f0] mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x34) [0x9ad054] mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xC13) [0x9adf93] mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x28B) [0x9aea4b] mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERNS_5CurOpES3_b+0x76A) [0xb7ca1a] mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortEb+0xB25) [0xa8fcd5] mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xE0) [0x7e41f0] mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x411) [0xed1381] libpthread.so.0(+0x7F18) [0x7fb3acf74f18] libc.so.6(clone+0x6D) [0x7fb3ac086b9d] ----- END BACKTRACE -----
few more events related to conn60 before the crash
2015-01-05T17:47:22.452+0000 I SHARDING [conn60] moveChunk data transfer progress: { active: true, ns: "sbtest.sbtest1", from: "rs2/172.31.32.214:27017,ip-172-31-35-229:27017", min: { _id: -7816322693657637576 }, max: { _id: -7672769179660119751 }, shardKeyPattern: { _id: "hashed" }, state: "clone", counts: { cloned: 1480, clonedBytes: 321160, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 6 2015-01-05T17:47:22.602+0000 I SHARDING [conn60] About to check if it is safe to enter critical section 2015-01-05T17:47:22.602+0000 E SHARDING [conn60] moveChunk cannot enter critical section before all data is cloned, 81584 locs were not transferred but to-shard reported { active: true, ns: "sbtest.sbtest1", from: "rs2/172.31.32.214:27017,ip-172-31-35-229:27017", min: { _id: -7816322693657637576 }, max: { _id: -7672769179660119751 }, shardKeyPattern: { _id: "hashed" }, state: "clone", counts: { cloned: 1480, clonedBytes: 321160, catchup: 0, steady: 0 }, ok: 1.0 } 2015-01-05T17:47:22.602+0000 I SHARDING [conn60] MigrateFromStatus::done About to acquire global lock to exit critical section
the setup is
- 3 config server
- 1 mongos
- 3 shards, each with two member replication set
- wiredTiger
- all options default
- is related to
-
SERVER-4740 Use monotonic clock sources for Timer
- Closed