Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-12528

SIGTERM can cause an fassert if we're actively replicating

    • Type: Icon: Task Task
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Replication
    • Environment:
      CentOS release 6.4 64bit; Openstack virtual server/KVM guest
      MongoDB 2.4.8
    • Replication
    • Fully Compatible

      Our init scripts currently send a SIGTERM when stopping mongod.

      If we are actively replicating and the repl worker thread catches the SIGTERM we get a stacktrace like the following (on 2.4):

      Thu Jan 23 03:45:08.862 [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
      Thu Jan 23 03:45:08.863 [repl writer worker 1] ERROR: writer worker caught exception: interrupted at shutdown on: { ts: Timestamp 1390448708000|2, h: 107290241850708099, v: 2, op: "i", ns: "XXX.YYY", o: { _id: ObjectId('...'), urn: "ZZZ", dateUpdated: new Date(1390448708000) } }
      Thu Jan 23 03:45:08.863 [repl writer worker 1]   Fatal Assertion 16360
      0xde05e1 0xda03d3 0xc28f3c 0xdadf21 0xe28e69 0x3f79007851 0x3f78ce890d 
       /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde05e1]
       /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0xda03d3]
       /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x12c) [0xc28f3c]
       /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xdadf21]
       /usr/bin/mongod() [0xe28e69]
       /lib64/libpthread.so.0() [0x3f79007851]
       /lib64/libc.so.6(clone+0x6d) [0x3f78ce890d]
      Thu Jan 23 03:45:08.870 [repl writer worker 1] 
      
      ***aborting after fassert() failure
      

      Since fassert is not a graceful way of shutting mongod down - for example, it requires journal recovery on restart, and may not clear the lock file which would interfere with subsequent startup, and since "service restart" should be graceful, and since we provide the init script that uses SIGTERM to implement "service restart", this seems like a bug on our side.

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            joanna.cheng@mongodb.com Joanna Cheng
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: