Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3544

WT rollback_to_stable hits "memory stomping bug"

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 3.5.13, WT3.0.0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage 2017-09-11

      Reproduce

      Apply the patch on top of 324bc4e858959bbd77d111bddb4b166048723312
      Start mongod as a single member replica set node:

      rm -rf /data/db/* && ./mongod --replSet rs0 --oplogSize 100
      

      When mongod is running, run mongo restart.js. This reproduces every run for me.

      Observations

      On the first call to `restartStorageEngine`, the server crashes with the following error message and backtrace:

      memory stomping bug: a word after object at 0x7ffff33e99a0 has been corrupted
      (gdb) bt
      #0  0x00007ffff6cf2428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
      #1  0x00007ffff6cf402a in __GI_abort () at abort.c:89
      #2  0x00005555597c497a in LogPrintf (severity=-4, pat=0x555559979bca "memory stomping bug: a word after object at %p has been corrupted", 
          ap=0x7ffff7ecce00) at src/third_party/gperftools-2.5/src/base/logging.h:209
      #3  0x00005555597cf1e8 in RAW_LOG (lvl=-4, pat=0x555559979bca "memory stomping bug: a word after object at %p has been corrupted")
          at src/third_party/gperftools-2.5/src/base/logging.h:228
      #4  0x00005555597eb6ba in MallocBlock::CheckLocked (this=0x7ffff33e9980, type=-271733872) at src/third_party/gperftools-2.5/src/debugallocation.cc:455
      #5  0x00005555597ecb56 in MallocBlock::CheckAndClear (this=0x7ffff33e9980, type=-271733872, given_size=0)
          at src/third_party/gperftools-2.5/src/debugallocation.cc:410
      #6  0x00005555597ecaef in MallocBlock::Deallocate (this=0x7ffff33e9980, type=-271733872, given_size=0)
          at src/third_party/gperftools-2.5/src/debugallocation.cc:564
      #7  0x00005555597e238a in DebugDeallocate (ptr=0x7ffff33e99a0, type=-271733872, given_size=0)
          at src/third_party/gperftools-2.5/src/debugallocation.cc:1039
      #8  0x000055555988ab5e in tc_free (ptr=0x7ffff33e99a0) at src/third_party/gperftools-2.5/src/debugallocation.cc:1229
      #9  0x00005555572cfcca in __wt_free_int (session=0x7ffff392d020, p_arg=0x7ffff5e662c0) at src/third_party/wiredtiger/src/os_common/os_alloc.c:311
      #10 0x0000555557341de7 in __wt_txn_rollback_to_stable (session=0x7ffff392d020, cfg=0x7ffff7ecd0d0)
          at src/third_party/wiredtiger/src/txn/txn_rollback_to_stable.c:431
      #11 0x000055555729b8c1 in __conn_rollback_to_stable (wt_conn=0x7ffff5e5f020, config=0x0) at src/third_party/wiredtiger/src/conn/conn_api.c:1192
      #12 0x0000555557249c9f in mongo::WiredTigerKVEngine::recoverToStableTimestamp (this=0x7ffff5ff3da0)
          at src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp:1117
      #13 0x00005555574492d3 in mongo::KVStorageEngine::recoverToStableTimestamp (this=0x7ffff5d0ffe0, opCtx=0x7ffff32e88e0)
          at src/mongo/db/storage/kv/kv_storage_engine.cpp:445
      #14 0x000055555762bd0c in mongo::recoverToStableTimestamp (opCtx=0x7ffff32e88e0) at src/mongo/db/recover_to_timestamp.cpp:58
      #15 0x00005555575c4cd5 in mongo::RestartStorageEngineCmd::run (this=0x55555ade94f8 <mongo::restartCmd>, opCtx=0x7ffff32e88e0, db=..., 
          cmdObj=owned BSONObj 50 bytes @ 0x7ffff33e7ea8 = {...}, result=...) at src/mongo/db/commands/restart_store_engine.cpp:75
      #16 0x0000555558f7492b in mongo::BasicCommand::enhancedRun (this=0x55555ade94f8 <mongo::restartCmd>, opCtx=0x7ffff32e88e0, request=..., result=...)
          at src/mongo/db/commands.cpp:390
      #17 0x0000555558f72e2d in mongo::Command::publicRun (this=0x55555ade94f8 <mongo::restartCmd>, opCtx=0x7ffff32e88e0, request=..., result=...)
          at src/mongo/db/commands.cpp:328
      #18 0x000055555751c692 in mongo::(anonymous namespace)::runCommandImpl (opCtx=0x7ffff32e88e0, command=0x55555ade94f8 <mongo::restartCmd>, 
          request=..., replyBuilder=0x7ffff30cfb60, startOperationTime=...) at src/mongo/db/service_entry_point_mongod.cpp:458
      #19 0x000055555751a73a in mongo::(anonymous namespace)::execCommandDatabase (opCtx=0x7ffff32e88e0, command=0x55555ade94f8 <mongo::restartCmd>, 
          request=..., replyBuilder=0x7ffff30cfb60) at src/mongo/db/service_entry_point_mongod.cpp:705
      #20 0x00005555575181a3 in mongo::(anonymous namespace)::runCommands(mongo::OperationContext*, mongo::Message const&)::$_4::operator()() const (
          this=0x7ffff7ecf950) at src/mongo/db/service_entry_point_mongod.cpp:822
      #21 0x0000555557515307 in mongo::(anonymous namespace)::runCommands (opCtx=0x7ffff32e88e0, message=...)
          at src/mongo/db/service_entry_point_mongod.cpp:778
      #22 0x0000555557513f54 in mongo::ServiceEntryPointMongod::handleRequest (this=0x7ffff5ceca60, opCtx=0x7ffff32e88e0, m=...)
          at src/mongo/db/service_entry_point_mongod.cpp:1091
      #23 0x00005555575309fe in mongo::ServiceStateMachine::_processMessage (this=0x7ffff6020b30, guard=...)
          at src/mongo/transport/service_state_machine.cpp:317
      #24 0x000055555752cfff in mongo::ServiceStateMachine::_runNextInGuard (this=0x7ffff6020b30, guard=...)
          at src/mongo/transport/service_state_machine.cpp:407
      #25 0x000055555752d91f in mongo::ServiceStateMachine::runNext (this=0x7ffff6020b30) at src/mongo/transport/service_state_machine.cpp:373
      #26 0x000055555752a262 in mongo::ServiceEntryPointImpl::startSession(std::shared_ptr<mongo::transport::Session>)::$_1::operator()() (
          this=0x7ffff6191420) at src/mongo/transport/service_entry_point_impl.cpp:89
      #27 0x0000555557529ffd in std::_Function_handler<void (), mongo::ServiceEntryPointImpl::startSession(std::shared_ptr<mongo::transport::Session>)::$_1>::_M_invoke(std::_Any_data const&) (__functor=...)
          at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:1871
      #28 0x0000555557441f1e in std::function<void ()>::operator()() const (this=0x7ffff6190d90)
          at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:2267
      #29 0x00005555594b38ec in mongo::(anonymous namespace)::runFunc (ctx=0x7ffff6190d90) at src/mongo/transport/service_entry_point_utils.cpp:55
      #30 0x00007ffff708e6ba in start_thread (arg=0x7ffff7ed1700) at pthread_create.c:333
      #31 0x00007ffff6dc43dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
      

      After 100 calls to setStableTimestamp, the server stops advancing the stable timestamp. Thus there is real work for rollback_to_stable to do. I tried to reduce restart.js to the minimum set of calls to reproduce the error, but I unfortunately did not get far. The problem started when I had added some writes to a local collection, i.e: where WT table logging is engaged, thus the table should not be rolled back. Changing let testLocal = true; to false should cause the script to succeed. Note that in addition to this user collection in local, the oplog also has journaling enabled and otherwise exhibits correct behavior on rollback.

            Assignee:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Reporter:
            daniel.gottlieb@mongodb.com Daniel Gottlieb (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: