-
Type: Bug
-
Resolution: Incomplete
-
Priority: Critical - P2
-
None
-
Affects Version/s: 2.8.0-rc5
-
Component/s: Concurrency, Networking, Storage
-
ALL
-
Seen with sysbench and YCSB workloads during standalone, single-node longevity testing, the mongod process can enter a persistent error condition where it appears to stop reading sockets from client connections. sysbench and YCSB use the 2.12.4 java driver. Client and server are on the same machine.
Nothing useful logged on server log. Tried db.runCommand(
{ setParameter: 1, logLevel: 5 }) from mongo client but still nothing.
mongo client can run "show collections" but, after displaying the collection anems, hangs.
netstat shows evidence of waiting data in a socket for mongod:
[qconner@slave-4.perf.ny.cbi ~/apps/mongo]$ netstat -n|grep ESTAB tcp 0 0 172.30.200.12:60219 184.106.28.85:443 ESTABLISHED tcp 0 0 172.30.200.12:46284 54.209.155.229:3232 ESTABLISHED tcp 221981 0 127.0.0.1:27017 127.0.0.1:40262 ESTABLISHED tcp 221981 0 127.0.0.1:27017 127.0.0.1:40256 ESTABLISHED tcp 0 0 172.30.200.12:22 172.30.200.8:61008 ESTABLISHED tcp 0 0 127.0.0.1:27017 127.0.0.1:40261 ESTABLISHED tcp 0 0 127.0.0.1:27017 127.0.0.1:40253 ESTABLISHED tcp 0 0 127.0.0.1:27017 127.0.0.1:40257 ESTABLISHED tcp 0 0 127.0.0.1:27017 127.0.0.1:40259 ESTABLISHED tcp 0 0 127.0.0.1:27017 127.0.0.1:40258 ESTABLISHED tcp 0 0 127.0.0.1:27017 127.0.0.1:40255 ESTABLISHED tcp 0 0 172.30.200.12:56842 68.203.14.106:4506 ESTABLISHED tcp 0 0 172.30.200.12:45664 68.203.14.106:4505 ESTABLISHED tcp 0 0 172.30.200.12:57205 54.209.155.229:3233 ESTABLISHED tcp 0 0 127.0.0.1:27017 127.0.0.1:40254 ESTABLISHED tcp 0 0 172.30.200.12:58762 68.203.14.106:4506 ESTABLISHED
strace of mongod shows only calls to poll: or select():
[qconner@slave-4.perf.ny.cbi ~/apps/mongo]$ strace -p 26297 Process 26297 attached - interrupt to quit select(8, [6 7], NULL, NULL, {0, 5891}) = 0 (Timeout) select(8, [6 7], NULL, NULL, {0, 10000}) = 0 (Timeout) select(8, [6 7], NULL, NULL, {0, 10000}) = 0 (Timeout) select(8, [6 7], NULL, NULL, {0, 10000}) = 0 (Timeout) select(8, [6 7], NULL, NULL, {0, 10000}) = 0 (Timeout) select(8, [6 7], NULL, NULL, {0, 10000}) = 0 (Timeout)
Bottom of mongod log file is unremarkable:
2015-01-21T22:50:13.239+0000 I COMMAND [conn5] command sbtest.$cmd command: insert { insert: "sbtest6", ordered: true, documents: 1000 } keyUpdates:0 writeConflicts:0 n umYields:0 reslen:40 1294ms 2015-01-21T22:50:17.213+0000 I WRITE [conn5] insert sbtest.sbtest6 query: { _id: 17816202, k: 47394, c: "38386020571-30959444150-14559542433-02413298585-47740660800-2 7829143017-40155805950-73693273374-88438959896-98232277051", pad: "25264851037-22995234305-81217374636-99284513390-23152612121" } ninserted:1 keyUpdates:0 writeConflicts :0 numYields:0 143ms 2015-01-21T22:50:17.242+0000 I COMMAND [conn5] command sbtest.$cmd command: insert { insert: "sbtest6", ordered: true, documents: 1000 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:40 179ms 2015-01-22T16:32:57.907+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:42321 #10 (10 connections now open) 2015-01-22T16:33:07.237+0000 I NETWORK [conn10] end connection 127.0.0.1:42321 (9 connections now open) 2015-01-22T16:33:08.497+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:42322 #11 (10 connections now open) 2015-01-22T16:34:13.257+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:42324 #12 (11 connections now open) 2015-01-22T16:34:13.258+0000 I NETWORK [conn12] end connection 127.0.0.1:42324 (10 connections now open) 2015-01-22T16:34:49.896+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:42326 #13 (11 connections now open) 2015-01-22T16:50:15.766+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:42345 #14 (12 connections now open) 2015-01-22T16:50:15.766+0000 I NETWORK [conn14] end connection 127.0.0.1:42345 (11 connections now open) 2015-01-22T16:52:55.395+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:42375 #15 (12 connections now open) 2015-01-22T16:53:10.491+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:42376 #16 (13 connections now open) 2015-01-22T16:53:10.492+0000 I NETWORK [conn16] end connection 127.0.0.1:42376 (12 connections now open) 2015-01-22T16:55:48.084+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:42380 #17 (13 connections now open)
Sorry I don't have more to go on. Running it again with symbols so gdb attach can be leveraged.