-
Type: Bug
-
Resolution: Cannot Reproduce
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.0.1
-
Component/s: Replication
-
Environment:ubuntu 10.04 x86_64; 4 x replicaset shards, each replicaset with two participating members and one arbiter
-
Linux
We are in the process of resyncing to address fragmentation and to upgrade all indexes to the newer v1 format. This process worked fine for 3 of our 4 shards. The 4th shard is the primary for the namespace and because we have not yet fully tested MR output to a sharded collection, it also stores the output of all MR tasks. None of the indexes on this shard have been upgraded to v1. In 4 resync attempts (remove all data on secondary), we have noticed that the
{ _id: 1 }index on some collections is not built during the resync process and hypothesize that there is some corruption or incompatibility in the index that appears to leave the index functional for queries but somehow prevents it from being rebuilt during resync attempts. We believe that this impacts replay of the oplog, causing it to seemingly hang--optimeDate does not advance (perhaps it is working but extremely slow). Shutting down the node shows the in-progress operation being interrupted:
Fri Oct 28 16:26:33 [rsSync] replSet syncTail: 11600 interrupted at shutdown, syncing: { ts: Timestamp 1319826732000|30, h: 1028901388370155163, op: "u", ns: "mydb.mycoll", o2: { _id:
{ ... } }, o: { _id: { ... }, value:
{ ...data... }} }
The timestamp matches the optimeDate. We have seen this affect both inserts and updates. It appears to only affect collections which are the target for MR output. Additionally, it may only affect collections wherein MR results are re-reduced/re-finalized, but we are less confident about this assessment.
- related to
-
SERVER-5040 Cloner can fail to create unique indexes on initial sync
- Closed