-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.0.0-rc8
-
Component/s: Internal Code
-
None
-
ALL
-
This is the big IBM X6 box:
buzz@IDF-IBM-Test-1 ~]$ uname -a Linux IDF-IBM-Test-1.10gen.cc 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux [buzz@IDF-IBM-Test-1 ~]$ cat /etc/re readahead.conf redhat-lsb/ redhat-release request-key.conf request-key.d/ resolv.conf [buzz@IDF-IBM-Test-1 ~]$ cat /etc/redhat-release CentOS release 6.6 (Final) [buzz@IDF-IBM-Test-1 ~]$ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg_idfibmtest1-lv_root 51606140 11926496 37058204 25% / tmpfs 396812172 0 396812172 0% /dev/shm /dev/sda1 495844 96335 373909 21% /boot /dev/mapper/vg_idfibmtest1-lv_home 231167100 832892 218591592 1% /home /dev/md1 768924576 20997468 708868020 3% /data/1 /dev/md2 768924576 10687204 719178284 2% /data/2 /dev/md3 768924576 201440 729664048 1% /data/3 /dev/md4 768924576 3242900 726622588 1% /data/4 top - 06:18:58 up 5 days, 20:39, 3 users, load average: 0.00, 0.00, 0.07 Tasks: 1814 total, 1 running, 1813 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 793624344k total, 10494188k used, 783130156k free, 270752k buffers Swap: 4194296k total, 0k used, 4194296k free, 4892660k cached
The rc8 server is vanilla WiredTiger with no special options:
numactl --interleave=all /home/buzz/3.0.0-rc8/bin/mongod --storageEngine=wiredTiger --port 37017 --dbpath /data/4/data0/db0 --logpath /tmp/mongo0.log --fork
Program is performance test. It generates load for a state-event model like this:
1. Find an item in the currentPos collection using findAndModify to "logically reserve" it
2. Copy it to historicPos collection
3. Insert an event doc into the events collection
4 findAndModify the same item in currentPos to write updated data
There are 32 threads. Each thread has a loop that randomly picks a position key and goes thru the steps outlined above.
Failure occurs about 90% of the time between 3000 and 7000 turns of the crank (1-4 above).
This seems to be the start of trouble:
2015-02-17T05:54:22.306-0500 F REPL [conn27] Rollback of logOp not currently allowed (SERVER-15192) 2015-02-17T05:54:22.306-0500 I - [conn27] Fatal Assertion 18805 2015-02-17T05:54:22.318-0500 I COMMAND [conn22] command test5.$cmd command: findAndModify { findandmodify: "currentPos", query: { k: "P5507", r: 1 }, update: { $inc: { n: 1 }, $set: { val: 121, ts: new Date(1424170462154), r: 0 } }, upsert: true } update: { $inc: { n: 1 }, $set: { val: 121, ts: new Date(1424170462154), r: 0 } } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:8361 163ms 2015-02-17T05:54:22.320-0500 I COMMAND [conn30] command test5.$cmd command: findAndModify { findandmodify: "currentPos", query: { k: "P48449", r: 0 }, update: { $set: { r: 1, ts: new Date(1424170462156), x: BinData(0, 02654E522D2D326778F31B1BA8B6A8EF889742FAFEC2AA305660EADF51F63400EC4832AABD4D264BAFBCF357C98DD4BE590A18CB892BB02F009BCD4C7D1BF08FF6023526BA33...) } }, new: true, upsert: true } update: { $set: { r: 1, ts: new Date(1424170462156), x: BinData(0, 02654E522D2D326778F31B1BA8B6A8EF889742FAFEC2AA305660EADF51F63400EC4832AABD4D264BAFBCF357C98DD4BE590A18CB892BB02F009BCD4C7D1BF08FF6023526BA33...) } } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:8362 163ms 2015-02-17T05:54:22.320-0500 I CONTROL [conn27] 0xf3f059 0xee9191 0xecd5b1 0xc247ac 0xd60996 0x7e42f6 0x94218e 0x9429de 0x9b9434 0x9ba373 0x9baf6b 0xb8abe5 0xa9c968 0x7e6cc0 0xefd11b 0x30ba2079d1 0x30b9ee88fd ----- BEGIN BACKTRACE ----- {"backtrace":[{"b":"400000","o":"B3F059"},{"b":"400000","o":"AE9191"},{"b":"400000","o":"ACD5B1"},{"b":"400000","o":"8247AC"},{"b":"400000","o":"960996"},{"b":"400000","o":"3E42F6"},{"b":"400000","o":"54218E"},{"b":"400000","o":"5429DE"},{"b":"400000","o":"5B9434"},{"b":"400000","o":"5BA373"},{"b":"400000","o":"5BAF6B"},{"b":"400000","o":"78ABE5"},{"b":"400000","o":"69C968"},{"b":"400000","o":"3E6CC0"},{"b":"400000","o":"AFD11B"},{"b":"30BA200000","o":"79D1"},{"b":"30B9E00000","o":"E88FD"}],"processInfo":{ "mongodbVersion" : "3.0.0-rc8", "gitVersion" : "9d0714cdcffadb203ff68730acedbc40875403da", "uname" : { "sysname" : "Linux", "release" : "2.6.32-431.el6.x86_64", "version" : "#1 SMP Fri Nov 22 03:15:09 UTC 2013", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFFEEEFF000", "elfType" : 3 }, { "path" : "/lib64/libpthread.so.0", "elfType" : 3 }, { "path" : "/lib64/librt.so.1", "elfType" : 3 }, { "path" : "/lib64/libdl.so.2", "elfType" : 3 }, { "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3 }, { "path" : "/lib64/libm.so.6", "elfType" : 3 }, { "path" : "/lib64/libgcc_s.so.1", "elfType" : 3 }, { "path" : "/lib64/libc.so.6", "elfType" : 3 }, { "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }} mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf3f059] mongod(_ZN5mongo10logContextEPKc+0xE1) [0xee9191] mongod(_ZN5mongo13fassertFailedEi+0x61) [0xecd5b1] mongod(+0x8247AC) [0xc247ac] mongod(_ZN5mongo22WiredTigerRecoveryUnit6_abortEv+0x36) [0xd60996] mongod(_ZN5mongo15WriteUnitOfWorkD1Ev+0x16) [0x7e42f6] mongod(_ZN5mongo16CmdFindAndModify7runImplEPNS_16OperationContextERKSsS4_RKNS_7BSONObjES7_S7_S7_bbbRNS_14BSONObjBuilderERSs+0x18AE) [0x94218e] mongod(_ZN5mongo16CmdFindAndModify3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x2AE) [0x9429de] mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x34) [0x9b9434] mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xC13) [0x9ba373] mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x28B) [0x9baf6b] mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERKNS_15NamespaceStringERNS_5CurOpES3_b+0x755) [0xb8abe5] mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortEb+0xAC8) [0xa9c968] mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xE0) [0x7e6cc0] mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x32B) [0xefd11b] libpthread.so.0(+0x79D1) [0x30ba2079d1] libc.so.6(clone+0x6D) [0x30b9ee88fd] ----- END BACKTRACE ----- 2015-02-17T05:54:22.320-0500 I - [conn27] ***aborting after fassert() failure
Interesting note: Same test program (but 16 threads instead of 32) ran OK all day yesterday (start, stop, reset, over and over) on a r3.4xlarge AWS instance running ami-d13845e1.
- duplicates
-
SERVER-17198 logOp rollback in findAndModify when query for modified document fails (fatal assertion)
- Closed