Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-6384

rollback to stable test leads to corruption

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None

      Running test/format modified to perform rollback_to_stable and check results. During a rollback_to_stable, I see a checksum error.

      This can be reproduced with a small set of changes to test/format: WT-6384-format.diff

      $ ./t -t -c Z ops.rebalance=0 runs=10 quiet=0 ops.truncate=0 runs.threads=16 runs.ops=0 runs.timer=2 transaction.rollback_to_stable=1
      t: process 28764 running
         1: rollback_to_stable: 0 ops repeated
         1: rollback_to_stable: 0 ops repeated
         1: rollback_to_stable: 0 ops repeated
         1: table, row-store (116 seconds)
         2: rollback_to_stable: 2949 ops repeated
         2: rollback_to_stable: 0 ops repeated
         2: rollback_to_stable: 2979 ops repeated
      [1591381651:785087][28764:0x7fd301460d00], file:wt.wt, txn rollback_to_stable: __wt_block_read_off, 283: wt.wt: read checksum error for 512B block at offset 46013952: block header checksum of 0x6d202121 doesn't match expected checksum of 0x79f9a659
      [1591381651:785154][28764:0x7fd301460d00], file:wt.wt, txn rollback_to_stable: __wt_bm_corrupt_dump, 135: {46013952, 512, 0x79f9a659}: (chunk 1 of 1): 70 74 65 64 20 62 79 20 66 6f 72 6d 61 74 20 74 6f 20 74 65 73 74 20 73 61 6c 76 61 67 65 20 21 21 21 20 6d 65 6d 6f 72 79 20 63 6f 72 72 75 70 74 65
       64 20 62 79 20 66 6f 72 6d 61 74 20 74 6f 20 74 65 73 74 20 73 61 6c 76 61 67 65 20 21 21 21 20 6d 65 6d 6f 72 79 20 63 6f 72 72 75 70 74 65 64 20 62 79 20 66 6f 72 6d 61 74 20 74 6f 20 74 65 73 74 20 73 61 6c 76 61 67 65 20 21 21 21 20 6d 65 6d 6f 72 79 20 63 6f 72 72 75 70 74 65 64 20 62 79 20 66 6f 72 6d 61 74 20 74 6f 20 74 65 73 74 20 73 61 6c 76 61 67 65 20 21 21 21 20 6d 65 6d 6f 72 79 20 63 6f 72 72 75 70 74 65 64 20 62 79 20 66 6f 72 6d 61 74 20 74 6f 20 74 65 73 74 20 73 61 6c 76 61 67 65 20 21 21 21 20 6d 65 6d 6f 72 79 20 63 6f 72 72 75 70 74 65 64 20 62 79 20 66 6f 72 6d 61 74 20 74 6f 20 74 65 73 74 20 73 61 6c 76 61 67 65 20 21 21 21 20 6d 65 6d 6f 72 79 20 63 6f 72 72 75 70 74 65 64 20 62 79 20 66 6f 72 6d 61 74 20 74 6f 20 74 65 73 74 20 73 61 6c 76 61 67 65 20 21 21 21 20 6d 65 6d 6f 72 79 20 63 6f 72 72 75 70 74 65 64 20 62 79 20 66 6f 72 6d 61 74 20 74 6f 20 74 65 73 74 20 73 61 6c 76 61 67 65 20 21 21 21 20 6d 65 6d 6f 72 79 20 63 6f 72 72 75 70 74 65 64 20 62 79 20 66 6f 72 6d 61 74 20 74 6f 20 74 65 73 74 20 73 61 6c 76 61 67 65 20 21 21 21 20 6d 65 6d 6f 72 79 20 63 6f 72 72 75 70 74 65 64 20 62 79 20 66 6f 72 6d 61 74 20 74 6f 20 74 65 73 74 20 73 61 6c 76 61 67 65 20 21 21 21 20 6d 65 6d 6f 72 79 20 63 6f 72 72 75 70 74 65 64 20 62 79 20 66 6f 72 6d 61 74 20 74 6f 20 74 65 73 74 20 73 61 6c 76 61 67 65 20 21 21 21 20 6d 65 6d 6f 72 79 20
      [1591381651:785168][28764:0x7fd301460d00], file:wt.wt, txn rollback_to_stable: __wt_block_read_off, 292: wt.wt: fatal read error: WT_ERROR: non-specific WiredTiger error
      [1591381651:785172][28764:0x7fd301460d00], file:wt.wt, txn rollback_to_stable: __wt_block_read_off, 292: the process must exit and restart: WT_PANIC: WiredTiger library panic
      [1591381651:785184][28764:0x7fd301460d00], txn-recover: __wt_txn_recover, 842: Recovery failed: WT_PANIC: WiredTiger library panic
      [1591381651:785841][28764:0x7fd301460d00], connection: __wt_cache_destroy, 350: cache server: exiting with 216 pages in memory and 0 pages evicted
      [1591381651:785852][28764:0x7fd301460d00], connection: __wt_cache_destroy, 355: cache server: exiting with 79272 image bytes in memory
      [1591381651:785858][28764:0x7fd301460d00], connection: __wt_cache_destroy, 358: cache server: exiting with 252603 bytes in memory
      t: run FAILED
      

      Here's the config file:

      ############################################
      #  RUN PARAMETERS: V2
      ############################################
      assert.commit_timestamp=0
      assert.read_timestamp=1
      backup=0
      backup.incremental=off
      backup.incr_granularity=9923
      btree.bitcnt=4
      btree.compression=none
      btree.dictionary=1
      btree.huffman_key=1
      btree.huffman_value=1
      btree.internal_key_truncation=1
      btree.internal_page_max=9
      btree.key_gap=1
      btree.key_max=43
      btree.key_min=30
      btree.leaf_page_max=12
      btree.memory_page_max=2
      btree.prefix_compression=1
      btree.prefix_compression_min=4
      btree.repeat_data_pct=22
      btree.reverse=0
      btree.split_pct=74
      btree.value_max=3020
      btree.value_min=19
      cache=91
      cache.evict_max=4
      cache.minimum=0
      checkpoint=on
      checkpoint.log_size=200
      checkpoint.wait=36
      disk.checksum=uncompressed
      disk.data_extend=0
      disk.direct_io=0
      disk.encryption=none
      disk.firstfit=0
      disk.mmap=0
      disk.mmap_all=1
      format.abort=0
      format.independent_thread_rng=1
      format.major_timeout=0
      logging=0
      logging.archive=0
      logging.compression=none
      logging.file_max=216654
      logging.prealloc=1
      lsm.auto_throttle=1
      lsm.bloom=1
      lsm.bloom_bit_count=10
      lsm.bloom_hash_count=20
      lsm.bloom_oldest=0
      lsm.chunk_size=7
      lsm.merge_max=16
      lsm.worker_threads=3
      ops.alter=0
      ops.compaction=1
      ops.hs_cursor=0
      ops.pct.delete=5
      ops.pct.insert=3
      ops.pct.modify=90
      ops.pct.read=0
      ops.pct.write=2
      ops.prepare=0
      ops.random_cursor=0
      ops.rebalance=0
      ops.salvage=1
      ops.truncate=0
      ops.verify=1
      quiet=1
      runs=10
      runs.in_memory=0
      runs.ops=0
      runs.rows=10000
      runs.source=table
      runs.threads=16
      runs.timer=2
      runs.type=row-store
      runs.verify_failure_dump=0   
      statistics=0
      statistics.server=1
      stress.aggressive_sweep=0
      stress.checkpoint=0
      stress.hs_checkpoint_delay=0
      stress.hs_sweep=0
      stress.split_1=0
      stress.split_2=0
      stress.split_3=0
      stress.split_4=0
      stress.split_5=0
      stress.split_6=0
      stress.split_7=0
      stress.split_8=0
      transaction.frequency=100
      transaction.isolation=snapshot
      transaction.rollback_to_stable=1
      transaction.timestamps=1
      wiredtiger.config=
      wiredtiger.rwlock=1
      wiredtiger.leak_memory=0
      ############################################
      

      I have a core dump, and just before the rollback_to_stable call, a checkpoint is done and a copy of the RUNDIR is saved, so I have that.

            Assignee:
            donald.anderson@mongodb.com Donald Anderson
            Reporter:
            donald.anderson@mongodb.com Donald Anderson
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: