-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
ALL
-
(copied to CRM)
Hi,
Since i migrate an repliset in mongodb 3.2.21 i can't add an member because the initial sync not works.
It's seems to be an bug corrected in 3.2.21. But it seems not.
In logs in the member i try to add i see this messages :
2018-10-03T16:28:43.553+0200 I NETWORK [rsSync] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_TIMEOUT] server [192.168.50.22:27017] 2018-10-03T16:28:43.553+0200 I NETWORK [rsSync] DBClientCursor::init call() failed 2018-10-03T16:28:43.555+0200 E REPL [rsSync] 10276 DBClientBase::findN: transport error: sf06-b.essos.lan:27017 ns: local.oplog.rs query: \{ ts: { $gte: Timestamp 1538576857000|17 } } 2018-10-03T16:28:43.555+0200 E REPL [rsSync] initial sync attempt failed, 9 attempts remaining 2018-10-03T16:28:48.555+0200 I REPL [rsSync] initial sync pending 2018-10-03T16:28:48.556+0200 I REPL [ReplicationExecutor] syncing from: sf06-b.essos.lan:27017 2018-10-03T16:28:48.559+0200 I REPL [rsSync] initial sync ensuring correct oplog visibility. Starting at \{ ts: Timestamp 1538576928000|18, t: -1 } 2018-10-03T16:29:18.559+0200 I NETWORK [rsSync] Socket recv() timeout 192.168.50.22:27017 2018-10-03T16:29:18.559+0200 I NETWORK [rsSync] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_TIMEOUT] server [192.168.50.22:27017] 2018-10-03T16:29:18.559+0200 I NETWORK [rsSync] DBClientCursor::init call() failed 2018-10-03T16:29:18.560+0200 E REPL [rsSync] 10276 DBClientBase::findN: transport error: sf06-b.essos.lan:27017 ns: local.oplog.rs query: \{ ts: { $gte: Timestamp 1538576928000|18 } } 2018-10-03T16:29:18.560+0200 E REPL [rsSync] initial sync attempt failed, 8 attempts remaining 2018-10-03T16:29:23.561+0200 I REPL [rsSync] initial sync pending 2018-10-03T16:29:23.561+0200 I REPL [ReplicationExecutor] syncing from: sf06-b.essos.lan:27017 2018-10-03T16:29:23.566+0200 I REPL [rsSync] initial sync ensuring correct oplog visibility. Starting at \{ ts: Timestamp 1538576932000|7, t: -1 } 2018-10-03T16:29:53.566+0200 I NETWORK [rsSync] Socket recv() timeout 192.168.50.22:27017 2018-10-03T16:29:53.566+0200 I NETWORK [rsSync] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_TIMEOUT] server [192.168.50.22:27017] 2018-10-03T16:29:53.566+0200 I NETWORK [rsSync] DBClientCursor::init call() failed 2018-10-03T16:29:53.568+0200 E REPL [rsSync] 10276 DBClientBase::findN: transport error: sf06-b.essos.lan:27017 ns: local.oplog.rs query: \{ ts: { $gte: Timestamp 1538576932000|7 } } 2018-10-03T16:29:53.568+0200 E REPL [rsSync] initial sync attempt failed, 7 attempts remaining 2018-10-03T16:29:58.568+0200 I REPL [rsSync] initial sync pending 2018-10-03T16:29:58.568+0200 I REPL [ReplicationExecutor] syncing from: sf06-b.essos.lan:27017 2018-10-03T16:29:58.572+0200 I REPL [rsSync] initial sync ensuring correct oplog visibility. Starting at \{ ts: Timestamp 1538576998000|18, t: -1 } 2018-10-03T16:30:00.586+0200 E REPL [rsSync] 11602 operation was interrupted 2018-10-03T16:30:00.587+0200 E REPL [rsSync] initial sync attempt failed, 6 attempts remaining 2018-10-03T16:30:02.420+0200 I NETWORK [conn14] end connection 192.168.50.22:57160 (19 connections now open) 2018-10-03T16:30:02.601+0200 I ASIO [ReplicationExecutor] dropping unhealthy pooled connection to sf06-b.essos.lan:27017 2018-10-03T16:30:02.601+0200 I ASIO [ReplicationExecutor] after drop, pool was empty, going to spawn some connections 2018-10-03T16:30:02.602+0200 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to sf06-b.essos.lan:27017 2018-10-03T16:30:02.603+0200 I ASIO [NetworkInterfaceASIO-Replication-0] Successfully connected to sf06-b.essos.lan:27017, took 2ms (1 connections now open to sf06-b.essos.lan:27017) 2018-10-03T16:30:02.604+0200 I REPL [ReplicationExecutor] Member sf06-b.essos.lan:27017 is now in state SECONDARY 2018-10-03T16:30:04.424+0200 I REPL [ReplicationExecutor] replSetElect voting yea for sf06-a.essos.lan:27017 (4) 2018-10-03T16:30:04.603+0200 I REPL [ReplicationExecutor] Member sf06-a.essos.lan:27017 is now in state PRIMARY 2018-10-03T16:30:05.587+0200 I REPL [rsSync] initial sync pending 2018-10-03T16:30:05.587+0200 I REPL [ReplicationExecutor] syncing from: sf06-b.essos.lan:27017 2018-10-03T16:30:05.592+0200 I REPL [rsSync] initial sync ensuring correct oplog visibility. Starting at \{ ts: Timestamp 1538576999000|19, t: -1 } 2018-10-03T16:30:08.720+0200 I NETWORK [initandlisten] connection accepted from 192.168.50.22:57178 #21 (20 connections now open) 2018-10-03T16:30:15.617+0200 I NETWORK [initandlisten] connection accepted from 192.168.50.23:47300 #22 (21 connections now open) ^C [root@sf06-c ~]# tail /var/log/mongodb/mongod.log -F 2018-10-03T16:30:40.745+0200 I NETWORK [initandlisten] connection accepted from 192.168.50.185:45334 #24 (23 connections now open) 2018-10-03T16:31:06.007+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:43886 #25 (24 connections now open) 2018-10-03T16:31:10.599+0200 I NETWORK [rsSync] Socket recv() timeout 192.168.50.22:27017 2018-10-03T16:31:10.599+0200 I NETWORK [rsSync] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_TIMEOUT] server [192.168.50.22:27017] 2018-10-03T16:31:10.599+0200 I NETWORK [rsSync] DBClientCursor::init call() failed 2018-10-03T16:31:10.600+0200 E REPL [rsSync] 10276 DBClientBase::findN: transport error: sf06-b.essos.lan:27017 ns: local.oplog.rs query: \{ ts: { $gte: Timestamp 1538576999000|19 } } 2018-10-03T16:31:10.600+0200 E REPL [rsSync] initial sync attempt failed, 4 attempts remaining 2018-10-03T16:31:15.600+0200 I REPL [rsSync] initial sync pending 2018-10-03T16:31:15.600+0200 I REPL [ReplicationExecutor] syncing from: sf06-a.essos.lan:27017 by request 2018-10-03T16:31:15.604+0200 I REPL [rsSync] initial sync ensuring correct oplog visibility. Starting at \{ ts: Timestamp 1538577047000|79, t: -1 } 2018-10-03T16:31:45.604+0200 I NETWORK [rsSync] Socket recv() timeout 192.168.50.23:27017 2018-10-03T16:31:45.604+0200 I NETWORK [rsSync] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_TIMEOUT] server [192.168.50.23:27017] 2018-10-03T16:31:45.604+0200 I NETWORK [rsSync] DBClientCursor::init call() failed 2018-10-03T16:31:45.606+0200 E REPL [rsSync] 10276 DBClientBase::findN: transport error: sf06-a.essos.lan:27017 ns: local.oplog.rs query: \{ ts: { $gte: Timestamp 1538577047000|79 } } 2018-10-03T16:31:45.606+0200 E REPL [rsSync] initial sync attempt failed, 3 attempts remaining 2018-10-03T16:31:50.606+0200 I REPL [rsSync] initial sync pending 2018-10-03T16:31:50.607+0200 I REPL [ReplicationExecutor] syncing from: sf06-b.essos.lan:27017 2018-10-03T16:31:50.712+0200 I REPL [rsSync] initial sync ensuring correct oplog visibility. Starting at \{ ts: Timestamp 1538577110000|31, t: -1 } 2018-10-03T16:32:20.712+0200 I NETWORK [rsSync] Socket recv() timeout 192.168.50.22:27017 2018-10-03T16:32:20.712+0200 I NETWORK [rsSync] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_TIMEOUT] server [192.168.50.22:27017]
- duplicates
-
SERVER-38425 Oplog Visibility Query is a collection scan in 3.2.21
- Closed
- is related to
-
SERVER-30724 Initial sync might miss ops that were in flight when it started
- Closed