Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3559

Detect when a checkpoint races with metadata changes

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 3.6.0-rc0, WT3.0.0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage 2017-10-02, Storage 2017-10-23

      As part of WT-3558 we disabled a diagnostic assertion in WiredTiger due to a MongoDB test failure. The assertion is helpful, because it allows us to capture cases where checkpoints are broken. The reason it was disabled is because it's possible that a schema operation (e.g: a table create) with a timestamp in the future is in the metadata, but not visible to the checkpoint now that checkpoints can be created at the stable timestamp.

      There is a MongoDB test case called:
      storage_wiredtiger_prefixed_record_store_test.exe

      It fails with an assertion:

      file:a.b.wt, WT_SESSION.checkpoint: __wt_checkpoint_get_handles, 317: !metadata_race
      

      The stack trace is:

      ...\src\mongo\util\stacktrace_windows.cpp(239) mongo::printStackTrace+0x43
      ...\src\mongo\util\signal_handlers_synchronous.cpp(182) mongo::`anonymous namespace'::printSignalAndBacktrace+0x73
      ...\src\mongo\util\signal_handlers_synchronous.cpp(238) mongo::`anonymous namespace'::abruptQuit+0x83
      d:\th\minkernel\crts\ucrt\src\appcrt\misc\signal.cpp(522) raise+0x468
      d:\th\minkernel\crts\ucrt\src\appcrt\startup\abort.cpp(71) abort+0x39
      ...\src\third_party\wiredtiger\src\os_common\os_abort.c(31) __wt_abort+0x15
      ...\src\third_party\wiredtiger\src\support\err.c(504) __wt_assert+0x37
      ...\src\third_party\wiredtiger\src\txn\txn_ckpt.c(325) __wt_checkpoint_get_handles+0x1fe
      ...\src\third_party\wiredtiger\src\conn\conn_dhandle.c(517) __conn_btree_apply_internal+0x97
      ...\src\third_party\wiredtiger\src\conn\conn_dhandle.c(574) __wt_conn_btree_apply+0x2e6
      ...\src\third_party\wiredtiger\src\txn\txn_ckpt.c(190) __checkpoint_apply_all+0x270
      ...\src\third_party\wiredtiger\src\txn\txn_ckpt.c(681) __checkpoint_prepare+0x30d
      ...\src\third_party\wiredtiger\src\txn\txn_ckpt.c(791) __txn_checkpoint+0x243
      ...\src\third_party\wiredtiger\src\txn\txn_ckpt.c(985) __txn_checkpoint_wrapper+0x62
      ...\src\third_party\wiredtiger\src\txn\txn_ckpt.c(1038) __wt_txn_checkpoint+0xac
      ...\src\third_party\wiredtiger\src\session\session_api.c(1680) __session_checkpoint+0x149
      ...\src\mongo\db\storage\wiredtiger\wiredtiger_session_cache.cpp(265) mongo::WiredTigerSessionCache::waitUntilDurable+0x3de
      ...\src\mongo\db\storage\wiredtiger\wiredtiger_kv_engine.cpp(174) mongo::WiredTigerKVEngine::WiredTigerCheckpointThread::run+0x470
      

      The failure has only been seen on Windows, but I have no reason to believe the failure mode is Windows specific.

        1. 3564.diff
          4 kB
          Michael Cahill
        2. wt3559_repro.patch
          4 kB
          Vamsi Boyapati
        3. wt3559.patch
          3 kB
          Vamsi Boyapati

            Assignee:
            vamsi.krishna@mongodb.com Vamsi Boyapati
            Reporter:
            alexander.gorrod@mongodb.com Alexander Gorrod
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: