The new checkpoint test uncovered a hang that can occur in the logging code. The job is on tinderbox. The only interesting stack is thread 1:
Thread 1 (Thread 0x7f57728fa700 (LWP 11342)): #0 0x0000003e6cae6e07 in sched_yield () from /lib64/libc.so.6 #1 0x00000000004359e8 in __wt_yield () at ../src/os_posix/os_yield.c:27 #2 0x000000000042771a in __wt_log_slot_join (session=session@entry=0x224ebb8, mysize=mysize@entry=4864, flags=flags@entry=20, myslot=myslot@entry=0x7ffc03db0c20) at ../src/log/log_slot.c:537 #3 0x00000000004246ad in __log_write_internal (flags=20, lsnp=0x0, record=0x2248060, session=0x224ebb8) at ../src/log/log.c:2135 #4 __wt_log_write (session=session@entry=0x224ebb8, record=<optimized out>, lsnp=lsnp@entry=0x0, flags=20) at ../src/log/log.c:2062 #5 0x000000000047b8c9 in __wt_txn_log_commit ( session=session@entry=0x224ebb8, cfg=cfg@entry=0x7ffc03db0cf0) at ../src/txn/txn_log.c:222 #6 0x00000000004767f3 in __wt_txn_commit (session=session@entry=0x224ebb8, cfg=cfg@entry=0x7ffc03db0cf0) at ../src/txn/txn.c:583 #7 0x000000000045f901 in __session_commit_transaction (wt_session=0x224ebb8, config=<optimized out>) at ../src/session/session_api.c:1427 #8 0x0000000000405260 in subtest_populate (close_test=true, opts=0x7ffc03db0e00) at ../../../test/csuite/wt2909_checkpoint_integrity/main.c:567 #9 subtest_main (argc=argc@entry=9, argv=argv@entry=0x7ffc03db1850, close_test=close_test@entry=true) at ../../../test/csuite/wt2909_checkpoint_integrity/main.c:499 #10 0x0000000000404acf in subtest_main (close_test=true, argv=0x7ffc03db1850, argc=9) at ../../../test/csuite/wt2909_checkpoint_integrity/main.c:457 #11 main (argc=9, argv=0x7ffc03db1850) at ../../../test/csuite/wt2909_checkpoint_integrity/main.c:641
Although the local slot variable had been optimized out, I managed to figure out which slot. Here is the slot's contents. Note that it is in a closed state and that the slot has an error set on it.
(gdb) p/x log->slot_pool[12] $8 = {{{slot_state = 0x4000008000000080, slot_unbuffered = 0x0, slot_error = 0x5, slot_start_offset = 0x28bd00, slot_last_offset = 0x28bd00, slot_release_lsn = {l = {offset = 0x28bd00, file = 0x1}, file_offset = 0x10028bd00}, slot_start_lsn = {l = { offset = 0x28bd00, file = 0x1}, file_offset = 0x10028bd00}, slot_end_lsn = {l = {offset = 0x28bd80, file = 0x1}, file_offset = 0x10028bd80}, slot_fh = 0x2273e30, slot_buf = { data = 0x7f57723a8010, size = 0x0, mem = 0x7f57723a8010, memsize = 0x40000, flags = 0x0}, flags = 0x0}, __padding = {0x80, 0x0, 0x0, 0x0, 0x80, 0x0, 0x0, 0x40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xbd, 0x28, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xbd, 0x28, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xbd, 0x28, 0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0xbd, 0x28, 0x0, 0x1, 0x0, 0x0, 0x0, 0x80, 0xbd, 0x28, 0x0, 0x1, 0x0, 0x0, 0x0}}}
I am also not sure how/why this closed slot is still the active slot because there are many free slots available.
- related to
-
WT-3287 review WiredTiger internal panic checks
- Closed