Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.2.3
Component/s: Internal Code, Replication, Stability
Labels:
None
Environment:
AWS EC2 m2.2xlarge instances. Instance has four IOPS volumes striped together.

Operating System:
Linux
Steps To Reproduce:

Hide

1) Create 2.2.3 slave, bootstrap it from an existing 2.0.5 replicaset
2) Snapshot the slave live disks
3) Create new slave from these snapshots
4) Let the slave recover with journal

Show
1) Create 2.2.3 slave, bootstrap it from an existing 2.0.5 replicaset 2) Snapshot the slave live disks 3) Create new slave from these snapshots 4) Let the slave recover with journal

I'm just upgrading our production cluster running mongodb 2.0.5 into 2.2.3. I setup a new slave (2.2.3) into the existing mongodb 2.0.5 replica-set and I let it bootstrap itself over the network. After this I snapshotted the mongodb storage volumes and created a new slave instance for these (to test recovery from backup).

After the new instance booted it started to recover itself from the journal. Immediately after recovery was completed the slave startet to get assertions about Invalid BSONObj size, which eventually killed the slave.

I've done the entire job twice, only to get exactly same results. There's the slave mongod.log attached.

The snapshots were done with RightScale block_device cookbook scripts. The actual steps are:
1) Lock the underlying XFS filesystem
2) Create LVM snapshot
3) Unlock the underlying XFS filesystem
4) After this the each EBS stripe under LVM is ordered to make an EBS snapshot.
This procedure is well tested by RightScale and should ensure that the snapshot is atomic and physically intact after the stripes are rejoined. The LVM snapshot is used the restore the volume.

My plan is to do a rolling upgrade:
1) First add second slave, with 2.2.3
2) Replace old slave with 2.2.3 by bootstrapping it with a snapshot from the already created slave and to let it catch up after recovering from journal
3) Step the old primary down and do the same for the old primary.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

mongod.log
88 kB
Feb 12 2013 07:31:12 AM UTC

is related to

SERVER-8867 Command to validate all data within a server

Closed

SERVER-8908 Better error log explanation if server notices data file corruption

Closed

Assignee:: James Wahlin

Reporter:: Juho Mäkinen

Participants:: Eliot Horowitz, James Wahlin, Juho Mäkinen

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: Feb 12 2013 07:31:12 AM UTC

Updated:: Mar 08 2013 03:55:40 PM UTC

Resolved:: Mar 08 2013 02:02:52 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates