-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.4.4
-
Component/s: Replication
-
Environment:Sun Jun 23 06:49:45.238 [initandlisten] db version v2.4.4
Sun Jun 23 06:49:45.238 [initandlisten] git version: 4ec1fb96702c9d4c57b1e06dd34eb73a16e407d2
Sun Jun 23 06:49:45.238 [initandlisten] build info: windows sys.getwindowsversion(major=6, minor=1, build=7601, platform=2, service_pack='Service Pack 1') BOOST_LIB_VERSION=1_49
Sun Jun 23 06:49:45.238 [initandlisten] db version v2.4.4 Sun Jun 23 06:49:45.238 [initandlisten] git version: 4ec1fb96702c9d4c57b1e06dd34eb73a16e407d2 Sun Jun 23 06:49:45.238 [initandlisten] build info: windows sys.getwindowsversion(major=6, minor=1, build=7601, platform=2, service_pack='Service Pack 1') BOOST_LIB_VERSION=1_49
-
Windows
-
We have a replica set with two nodes and one arbiter.
{
"_id" : "rsstuff",
"version" : 6,
"members" : [
,
,
{ "_id" : 2, "host" : "LOG-MNG21:27018", "arbiterOnly" : true } ]
}
Last night, the replica have started to fail. I have reviewing the log (I'm going to attach it later), I could see the following:
1) Communication error (tiemout) from secondary to primary
Thu Jun 27 02:03:41.524 [initandlisten] connection accepted from 192.168.3.23:61211 #1879438 (200 connections now open)
Thu Jun 27 02:03:41.883 [rsSyncNotifier] Socket recv() timeout 172.29.106.76:27017
Thu Jun 27 02:03:41.883 [rsSyncNotifier] SocketException: remote: 172.29.106.76:27017 error: 9001 socket exception [3] server [172.29.106.76:27017]
Thu Jun 27 02:03:41.883 [rsSyncNotifier] replset tracking exception: exception: 10278 dbclient error communicating with server: log-mng12:27017
2) Try to reconnect
Thu Jun 27 02:04:00.680 [rsSyncNotifier] replset setting oplog notifier to log-mng12:27017
...
Thu Jun 27 02:04:04.939 [rsBackgroundSync] replSet syncing to: log-mng12:27017
3) Then there was an error about size ("Size must be between 0 and 16793600(16MB)"). The aggregate.tmp.mr.xxx are temporary collection that they are created during mapreduce operations.
Thu Jun 27 02:04:05.126 [repl writer worker 2] build index aggregate.tmp.mr.Statistics_HotelsMeansEntryAvailability_52339_inc
{ _id: 1 }Thu Jun 27 02:04:05.126 [repl writer worker 2] build index done. scanned 0 total records. 0.001 secs
Thu Jun 27 02:04:05.126 [repl writer worker 2] info: creating collection aggregate.tmp.mr.Statistics_HotelsMeansEntryAvailability_52339_inc on add index
Thu Jun 27 02:04:05.126 [repl writer worker 2] build index aggregate.tmp.mr.Statistics_HotelsMeansEntryAvailability_52339_inc
Thu Jun 27 02:04:05.126 [repl writer worker 2] build index done. scanned 0 total records. 0 secs
Thu Jun 27 02:04:05.126 [repl writer worker 1] build index aggregate.tmp.mr.Statistics_HotelsMeansEntryAvailability_52339
Thu Jun 27 02:04:05.126 [repl writer worker 1] build index done. scanned 0 total records. 0.001 secs
Thu Jun 27 02:04:05.126 [repl writer worker 1] info: indexing in foreground on this replica; was a background index build on the primary
Thu Jun 27 02:04:05.126 [repl writer worker 1] build index aggregate.tmp.mr.Statistics_HotelsMeansEntryAvailability_52339
Thu Jun 27 02:04:05.142 [repl writer worker 1] build index done. scanned 0 total records. 0 secs
Thu Jun 27 02:04:05.142 [repl writer worker 1] build index aggregate.tmp.mr.Statistics_HotelsBookingEngine_52338_inc
Thu Jun 27 02:04:05.142 [rsBackgroundSync] Assertion: 10334:BSONObj size: 1845624932 (0x6400026E) is invalid. Size must be between 0 and 16793600(16MB) First element: : ?type=115
Thu Jun 27 02:04:05.142 [repl writer worker 1] build index done. scanned 0 total records. 0.001 secs
Thu Jun 27 02:04:05.142 [repl writer worker 1] info: creating collection aggregate.tmp.mr.Statistics_HotelsBookingEngine_52338_inc on add index
Thu Jun 27 02:04:05.142 [repl writer worker 1] build index aggregate.tmp.mr.Statistics_HotelsBookingEngine_52338_inc
Thu Jun 27 02:04:05.142 [repl writer worker 1] build index done. scanned 0 total records. 0 secs
Thu Jun 27 02:04:05.158 [repl writer worker 7] build index aggregate.tmp.mr.Statistics_HotelsBookingEngine_52338
4) Then the log shows a dump
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe ...\src\mongo\util\stacktrace.cpp(189) mongo::printStackTrace+0x3e
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe ...\src\mongo\util\assert_util.cpp(159) mongo::msgasserted+0xc1
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe ...\src\mongo\bson\bson-inl.h(219) mongo::BSONObj::_assertInvalid+0x3e6
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe ...\src\mongo\client\dbclientcursor.cpp(238) mongo::DBClientCursor::next+0x153
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe ...\src\mongo\client\dbclientcursor.h(78) mongo::DBClientCursor::nextSafe+0x43
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe ...\src\mongo\db\oplogreader.h(127) mongo::OplogReader::nextSafe+0x44
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe ...\src\mongo\db\repl\bgsync.cpp(351) mongo::replset::BackgroundSync::produce+0x457
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe ...\src\mongo\db\repl\bgsync.cpp(241) mongo::replset::BackgroundSync::producerThread+0x98
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe ...\src\third_party\boost\libs\thread\src\win32\thread.cpp(180) boost::`anonymous namespace'::thread_start_function+0x21
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(314) _callthreadstartex+0x17
Thu Jun 27 02:04:05.672 [rsBackgroundSync] mongod.exe f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(292) _threadstartex+0x7f
Thu Jun 27 02:04:05.672 [rsBackgroundSync] KERNEL32.DLL BaseThreadInitThunk+0x1a
Thu Jun 27 02:04:05.672 [rsBackgroundSync] replSet db exception in producer: 10334 BSONObj size: 1845624932 (0x6400026E) is invalid. Size must be between 0 and 16793600(16MB) First element: : ?type=115
5) Then, I could see "BSONElement: bad type -53"
Thu Jun 27 02:04:09.931 [initandlisten] connection accepted from 172.29.106.46:51733 #1879639 (201 connections now open)
Thu Jun 27 02:04:10.150 [rsSync] Assertion: 10320:BSONElement: bad type -53
Thu Jun 27 02:04:10.259 [conn1879163] end connection 172.29.106.52:16844 (200 connections now open)
Thu Jun 27 02:04:10.259 [initandlisten] connection accepted from 172.29.106.52:19619 #1879640 (201 connections now open)
Thu Jun 27 02:04:10.337 [initandlisten] connection accepted from 172.29.106.51:5754 #1879641 (202 connections now open)
Thu Jun 27 02:04:10.337 [conn1879641] end connection 172.29.106.51:5754 (201 connections now open)
Thu Jun 27 02:04:10.462 [initandlisten] connection accepted from 172.29.106.51:5756 #1879642 (202 connections now open)
Thu Jun 27 02:04:10.462 [conn1879642] end connection 172.29.106.51:5756 (201 connections now open)
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe ...\src\mongo\util\stacktrace.cpp(189) mongo::printStackTrace+0x3e
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe ...\src\mongo\util\assert_util.cpp(159) mongo::msgasserted+0xc1
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe ...\src\mongo\bson\bson-inl.h(673) mongo::BSONElement::size+0x1b5
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe ...\src\mongo\db\repl\rs_sync.cpp(484) mongo::replset::SyncTail::tryPopAndWaitForMore+0x161
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe ...\src\mongo\db\repl\rs_sync.cpp(380) mongo::replset::SyncTail::oplogApplication+0x449
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe ...\src\mongo\db\repl\rs_sync.cpp(723) mongo::ReplSetImpl::_syncThread+0x9c
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe ...\src\mongo\db\repl\rs_sync.cpp(743) mongo::ReplSetImpl::syncThread+0xdc
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe ...\src\mongo\db\repl\rs_sync.cpp(769) mongo::startSyncThread+0x8a
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe ...\src\third_party\boost\libs\thread\src\win32\thread.cpp(180) boost::`anonymous namespace'::thread_start_function+0x21
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(314) _callthreadstartex+0x17
Thu Jun 27 02:04:10.680 [rsSync] mongod.exe f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(292) _threadstartex+0x7f
Thu Jun 27 02:04:10.680 [rsSync] KERNEL32.DLL BaseThreadInitThunk+0x1a
Thu Jun 27 02:04:10.680 [rsSync] replSet syncThread: 10320 BSONElement: bad type -53