-
Type: Question
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.4.3
-
Component/s: Replication
-
None
-
Environment:Freebsd 9.0 amd64
We had a large dataset, with a stale member, and want to automatically resync it from primary (initial sync)
After removing it's data directory and starting it again, it went to STARTUP2 state, and started cloning data.
Data cloning (and indexing) stage took 18 hours, but after initial sync, did not changed at all.
This portion of log file is:
Tue May 21 13:39:54.466 [rsSync] oplog sync 1 of 3
...
Wed May 22 02:55:33.679 [rsSync] build index done. scanned 52202932 total records. 5286.34 secs
Wed May 22 02:55:35.019 [rsSync] oplog sync 3 of 3
Wed May 22 02:55:35.757 [rsBackgroundSync] repl: old cursor isDead, will initiate a new one
Wed May 22 02:59:13.691 [rsSync] replSet initialSyncOplogApplication applied 1001 operations, synced to May 21 14:22:18:22
Wed May 22 03:06:09.440 [rsSync] replSet initialSyncOplogApplication applied 2002 operations, synced to May 21 14:22:37:d
Wed May 22 03:10:59.526 [rsSync] replSet initialSyncOplogApplication applied 3003 operations, synced to May 21 14:23:25:20
Wed May 22 03:18:37.975 [rsSync] replSet initialSyncOplogApplication applied 4004 operations, synced to May 21 14:23:49:33
...
Wed May 22 09:56:35.674 [rsSync] replSet initialSyncOplogApplication applied 116116 operations, synced to May 21 15:27:59:10
I don't know is initial sync successful or not, but we was seeing `initialSyncOplogApplication` logs every ~5min, and sync time was moving very slow (sync time move 1 hour forward after 5 hours!).
We restarted mongodb service, but unfortunately, it starts to sync from scratch. With log like this:
Wed May 22 10:43:28.548 [rsStart] replSet I am 172.20.43.11:27118
Wed May 22 10:43:28.638 [rsStart] replSet STARTUP2
Wed May 22 10:43:28.645 [rsSync] replSet initial sync pending
Wed May 22 10:43:58.704 [rsSync] replSet initial sync drop all databases
Wed May 22 10:43:58.704 [rsSync] dropAllDatabasesExceptLocal 2
Wed May 22 10:43:58.708 [rsSync] removeJournalFiles
....
I think the state of server should be RECOVERING not STARTUP2, is this correct?
If yes, why server stuck to STARTUP2, and why server dropped all copied data after restart?
- duplicates
-
SERVER-4766 Make initial sync restartable per collection
- Closed