Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-8518

Recovering slave with journal causes Invalid BSONObj size -assertions

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.2.3
    • None
    • Environment:
      AWS EC2 m2.2xlarge instances. Instance has four IOPS volumes striped together.
    • Linux
    • Hide

      1) Create 2.2.3 slave, bootstrap it from an existing 2.0.5 replicaset
      2) Snapshot the slave live disks
      3) Create new slave from these snapshots
      4) Let the slave recover with journal

      Show
      1) Create 2.2.3 slave, bootstrap it from an existing 2.0.5 replicaset 2) Snapshot the slave live disks 3) Create new slave from these snapshots 4) Let the slave recover with journal

      I'm just upgrading our production cluster running mongodb 2.0.5 into 2.2.3. I setup a new slave (2.2.3) into the existing mongodb 2.0.5 replica-set and I let it bootstrap itself over the network. After this I snapshotted the mongodb storage volumes and created a new slave instance for these (to test recovery from backup).

      After the new instance booted it started to recover itself from the journal. Immediately after recovery was completed the slave startet to get assertions about Invalid BSONObj size, which eventually killed the slave.

      I've done the entire job twice, only to get exactly same results. There's the slave mongod.log attached.

      The snapshots were done with RightScale block_device cookbook scripts. The actual steps are:
      1) Lock the underlying XFS filesystem
      2) Create LVM snapshot
      3) Unlock the underlying XFS filesystem
      4) After this the each EBS stripe under LVM is ordered to make an EBS snapshot.
      This procedure is well tested by RightScale and should ensure that the snapshot is atomic and physically intact after the stripes are rejoined. The LVM snapshot is used the restore the volume.

      My plan is to do a rolling upgrade:
      1) First add second slave, with 2.2.3
      2) Replace old slave with 2.2.3 by bootstrapping it with a snapshot from the already created slave and to let it catch up after recovering from journal
      3) Step the old primary down and do the same for the old primary.

            Assignee:
            james.wahlin@mongodb.com James Wahlin
            Reporter:
            garo Juho Mäkinen
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: