Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-4888

mongod crash with signal 7 under high write load

    • Type: Icon: Bug Bug
    • Resolution: Incomplete
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.0.2
    • Component/s: None
    • None
    • Environment:
      uname -a
      Linux test-mongo2-us.internal.net 2.6.18-194.el5 #1 SMP Fri Apr 2 14:58:14 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
    • Linux

      I have a simple 3 shard cluster (1 config server, 1 mongos) set up for testing, throwing a high load of reads and writes at a collection sharded on _id (which is a random integer between 1 and 10M). After about an hour of testing, test-mongo2-us (the master) crashed with the below error, possibly while splitting a chunk / rebablancing given the timedstamps on the log entries.

      Tue Feb 7 00:01:21 [conn71] received splitChunk request: { splitChunk: "testdb.user", keyPattern:

      { _id: 1.0 }

      , min:

      { _id: 6252355 }

      , max:

      { _id: 6965549 }

      , from: "test-mongo2-us:27117", splitKeys: [

      { _id: 6585511 }

      ], shardId: "testdb.user-_id_6252355", configdb: "test-mongo1-us:27019" }
      Tue Feb 7 00:01:21 [conn71] created new distributed lock for testdb.user on test-mongo1-us:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
      Tue Feb 7 00:01:21 [conn73] command admin.$cmd command: { splitChunk: "testdb.user", keyPattern:

      { _id: 1.0 }

      , min:

      { _id: 6252355 }

      , max:

      { _id: 6965549 }

      , from: "test-mongo2-us:27117", splitKeys: [

      { _id: 6585515 }

      ], shardId: "testdb.user-_id_6252355", configdb: "test-mongo1-us:27019" } ntoreturn:1 reslen:351 555ms
      Tue Feb 7 00:01:21 [conn68] received splitChunk request: { splitChunk: "testdb.user", keyPattern:

      { _id: 1.0 }

      , min:

      { _id: 6252355 }

      , max:

      { _id: 6965549 }

      , from: "test-mongo2-us:27117", splitKeys: [

      { _id: 6585511 }

      ], shardId: "testdb.user-_id_6252355", configdb: "test-mongo1-us:27019" }
      Tue Feb 7 00:01:21 [conn68] created new distributed lock for testdb.user on test-mongo1-us:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
      Tue Feb 7 00:01:21 [conn23] could not acquire lock 'testdb.user/test-mongo2-us.web.blizzard.net:27117:1328559662:781691710' (another update won)
      Tue Feb 7 00:01:21 [conn23] distributed lock 'testdb.user/test-mongo2-us.web.blizzard.net:27117:1328559662:781691710' was not acquired.
      Tue Feb 7 00:01:21 [conn23] command admin.$cmd command: { splitChunk: "testdb.user", keyPattern:

      { _id: 1.0 }

      , min:

      { _id: 6252355 }

      , max:

      { _id: 6965549 }

      , from: "test-mongo2-us:27117", splitKeys: [

      { _id: 6585531 }

      ], shardId: "testdb.user-_id_6252355", configdb: "test-mongo1-us:27019" } ntoreturn:1 reslen:351 653ms
      Tue Feb 7 00:01:21 Invalid access at address: 0xb9bd3c

      Tue Feb 7 00:01:21 Got signal: 7 (Bus error).

            Assignee:
            Unassigned Unassigned
            Reporter:
            peger P Eger
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: