-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.4.5, 3.4.14, 3.6.3
-
Component/s: Querying, WiredTiger
-
None
-
ALL
-
One of our mongodb cluster has very weird issue.
I tried to upgrade our cluster from 3.4.1 to 3.6.3, after upgrade I noticed that one slave server gets stuck after a while, I can connect to that slave, some commands are working, but not all, e.g rs.printSlaveReplicationInfo() is not responding, also that slave starts lagging, there is no cpu usage or disk IO, server is idling.
And later even primary server got stuck (I suspect it is same issue, but not sure).
After primary got stuck I decided to rollback cluster to 3.4.13, same thing happened.
Rolled back whole cluster to 3.4.1, this version works.
I started playing with one slave server, tried different versions, turned out that 3.4.4 is OK, but 3.4.5 is not. Using git bisect I found that following commit is causing issues for us: https://github.com/mongodb/mongo/commit/c3fda2d13c2f60eb3a68564403985cb1c7e8aa43
Slave that gets stuck is also used for read-only queries, another slave that is not getting queries works fine.
Every time slave gets stuck there is are few huge queries like following in log:
2018-03-21T11:16:02.888+0000 I COMMAND [conn124] warning: log line attempted (644kB) over max size (10kB), printing beginning and end ... command Baz.foocollection command: getMore { getMore: 120605126995, collection: "foocollection" } originatingCommand: { find: "foocollection", filter: { $or: [ { fooObjectId: ObjectId('58931cae526107078bbf71f7'), stamp: 1494514165.463685 }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1494422652.740239 }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1494422716.137933 }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1494422723.481655 }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1495113378.631109 }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1494422718.830563 }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1495113381.569376 }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1495113388.42224 }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1495113426.608498 }, { fooObjectId: ObjectId('58931cae526107078bbf71f7'), stamp: 1493802053.254245 }, { fooObjectId: ObjectId('58931cae526107078bbf71f7'), stamp: 1493802049.093342 }, { fooObjectId: ObjectId('58931cae526107078bbf71f7'), stamp: 1493802054.233313 }, { fooObjectId: ObjectId('58931cae526107078bbf71f7'), stamp: 1494329972.558202 }, { fooObjectId: ObjectId('58931cae526107078bbf71f7'), stamp: 1493802055.457206 }, { stamp: 1487772816.953898, fooObjectId: ObjectId('5805bf6e74712800bb012f05') }, { fooObjectId: ObjectId('5805bf6e74712800bb012f05'), stamp: 1487772831.640239 }, { fooObjectId: ObjectId('589c35934e89350783564909'), stamp: 1496404387.08405 }, { fooObjectId: ObjectId('589c35934e89350783564909'), stamp: 1496404383.584682 }, { fooObjectId: ObjectId('589c35934e89350783564909'), stamp: 1496404357.338354 }, { fooObjectId: ObjectId('58931cae526107078bbf71f7'), stamp: 1494513722.518675 }, { fooObjectId: ObjectId('58b412645e9206070a1dd06a'), stamp: 1494423044.121071 }, { fooObjectId: ObjectId('58b412645e9206070a1dd06a'), stamp: 1494423041.918467 }, { fooObjectId: ObjectId('58b412645e9206070a1dd06a'), stamp: 1494423039.961413 }, { fooObjectId: ObjectId('58b412645e9206070a1dd06a'), stamp: 1494423034.819859 }, { stamp: 1501573001.144575, fooObjectId: ObjectId('5812fc35b2a83a00b8c9061f') }, { fooObjectId: ObjectId('5812fc35b2a83a00b8c9061f'), stamp: 1501573004.145434 }, { fooObjectId: ObjectId('5812fc35b2a83a00b8c9061f'), stamp: 1501572997.895948 }, { fooObjectId: ObjectId('5812fc35b2a83a00b8c9061f'), stamp: 1501573008.895141 }, { fooObjectId: ObjectId('5812fc35b2a83a00b8c9061f'), stamp: 1501573005.394891 }, { fooObjectId: ObjectId('5812fc35b2a83a00b8c9061f'), stamp: 1501090931.443829 }, { fooObjectId: ObjectId('58931cae526107078bbf71f7'), stamp: 1497892396.056092 }, { stamp: 1493890957.166266, fooObjectId: ObjectId('58b412d65e9206070a1dd5f6') }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1493890819.475622 }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1493890822.902208 }, { fooObjectId: ObjectId('57beea4dd93194b60029ee9a'), stamp: 1490703301.703321 }, { stamp: 1487949706.666049, fooObjectId: ObjectId('5805c00f74712800bb01320a') }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1497550576.221023 }, { fooObjectId: ObjectId('58b412d65e9206070a1dd5f6'), stamp: 1497550576.95499 }, { fooObjectId: ObjectId('58b412645e9206070a1dd06a'), stamp: 1511271224.294596 }, { fooObjectId: ObjectId('58b412645e9206070a1dd06a'), stamp: 15112712 .......... on.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 }, IXSCAN { stamp: 1, position.x: 1, position.y: 1 } cursorid:120605126995 keysExamined:34140 docsExamined:34140 cursorExhausted:1 numYields:307 nreturned:34140 reslen:11101775 locks:{ Global: { acquireCount: { r: 616 }, acquireWaitCount: { r: 146 }, timeAcquiringMicros: { r: 469372 } }, Database: { acquireCount: { r: 308 } }, Collection: { acquireCount: { r: 308 } } } protocol:op_query 1110ms
It doesn't happen each time, most of the time these queries are successful.
I can provide logs, but not sure how to do that privately as our logs can contain sensitive information.
- duplicates
-
WT-3972 Allow more than 64K cursors to be open on a data source simultaneously
- Closed
- related to
-
SERVER-34012 Planner's logic for taking union of index bounds intervals is slow for large $or queries
- Closed