-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: DHandles
-
Storage Engines
-
91
-
3
-
StorEng - 2023-06-13, 2023-07-25 Absolute unit
WT-10994 fixed a correctness issue where the sweep server was ignoring the WT_DHANDLE_DROPPED flag and thereby causing use-after-free errors.
However, that WT-10994 fix impacted the threshold checks in test_sweep04.py, with the result that the unit-test-extra-long task (and variants thereof) are failing. The WT_DHANDLE_DROPPED change in WT-10994 was subsequently reverted in WT-11143 to get the Evergreen tasks passing again (but restoring the risk of the use-after-free error).
[2023/05/30 04:22:11.593] ====================================================================== [2023/05/30 04:22:11.593] [pid:4428]: FAIL: test_sweep04.test_sweep04.test_big_run [2023/05/30 04:22:11.593] ---------------------------------------------------------------------- [2023/05/30 04:22:11.593] [pid:4428]: testtools.testresult.real._StringException: Traceback (most recent call last): [2023/05/30 04:22:11.593] File "/data/mci/e87b86bb92d8e6f597aa111c744fd30d/wiredtiger/test/suite/wttest.py", line 468, in _callTestMethod [2023/05/30 04:22:11.593] method() [2023/05/30 04:22:11.593] File "/data/mci/e87b86bb92d8e6f597aa111c744fd30d/wiredtiger/test/suite/test_sweep04.py", line 243, in test_big_run [2023/05/30 04:22:11.593] self.assertLess(abs(q2_slope), 20.0) [2023/05/30 04:22:11.593] AssertionError: 100.12518607442978 not less than 20.0
[2023/05/29 05:18:50.222] ====================================================================== [2023/05/29 05:18:50.222] [pid:7057]: FAIL: test_sweep04.test_sweep04.test_big_run [2023/05/29 05:18:50.222] ---------------------------------------------------------------------- [2023/05/29 05:18:50.222] [pid:7057]: testtools.testresult.real._StringException: Traceback (most recent call last): [2023/05/29 05:18:50.222] File "/data/mci/432fcce236978afd7e2075c6e01d463c/wiredtiger/test/suite/wttest.py", line 468, in _callTestMethod [2023/05/29 05:18:50.222] method() [2023/05/29 05:18:50.222] File "/data/mci/432fcce236978afd7e2075c6e01d463c/wiredtiger/test/suite/test_sweep04.py", line 242, in test_big_run [2023/05/29 05:18:50.222] self.assertLess(abs(q2_slope), q1_slope) [2023/05/29 05:18:50.222] AssertionError: 101.36264105642258 not less than 101.27452581032414 [2023/05/29 05:18:50.222] ----------------------------------------------------------------------
Here is the tail of the results.txt file from an example failing run of test_sweep04:
test_sweep04.test_sweep04.test_big_run: 19800/20000 stressing=False test_sweep04.test_sweep04.test_big_run: DHANDLE STATS: close=0, remove=10117, sweep=10101, session_handles=195650, session_sweeps=14366, sweep_tod=220067, sweep_ref=0 test_sweep04.test_sweep04.test_big_run: dhandle_count=10013 test_sweep04.test_sweep04.test_big_run: file_open=114 test_sweep04.test_sweep04.test_big_run: average,slope=[7779.914572864322, 46.345302268920356] test_sweep04.test_sweep04.test_big_run: 19900/20000 stressing=False test_sweep04.test_sweep04.test_big_run: DHANDLE STATS: close=0, remove=10117, sweep=10101, session_handles=195650, session_sweeps=14425, sweep_tod=220067, sweep_ref=0 test_sweep04.test_sweep04.test_big_run: dhandle_count=10013 test_sweep04.test_sweep04.test_big_run: file_open=114 test_sweep04.test_sweep04.test_big_run: average,slope=[7791.08, 45.9868776719418] test_sweep04.test_sweep04.test_big_run: 1st qtr: average=3051.78,slope=101.6709243697479 test_sweep04.test_sweep04.test_big_run: 2nd qtr: average=8066.38,slope=100.74492196878751 test_sweep04.test_sweep04.test_big_run: 3rd qtr: average=10032.28,slope=-1.4642016806722689 test_sweep04.test_sweep04.test_big_run: 4th qtr: average=10013.88,slope=-0.04869147659063625 test_sweep04.test_sweep04.test_big_run: end run: average=10013.2,slope=-0.0481203007518797 test_sweep04.test_sweep04.test_big_run: finishing test_sweep04.test_sweep04.test_big_run: passed=False test_sweep04.test_sweep04.test_big_run: skipped=False test_sweep04.test_sweep04.test_big_run: preserving directory WT_TEST/test_sweep04.0 test_sweep04.test_sweep04.test_big_run: FAIL test_sweep04.test_sweep04.test_big_run: preserving directory WT_TEST/test_sweep04.0 test_sweep04.test_sweep04.test_big_run: TEST COMPLETED
Here is the tail of the result.txt file from an example passing run of test_sweep04 prior to WT-10994.
test_sweep04.test_sweep04.test_big_run: 19800/20000 stressing=False test_sweep04.test_sweep04.test_big_run: DHANDLE STATS: close=9900, remove=20006, sweep=10102, session_handles=195479, session_sweeps=12192, sweep_tod=218423, sweep_ref=1 test_sweep04.test_sweep04.test_big_run: dhandle_count=113 test_sweep04.test_sweep04.test_big_run: file_open=114 test_sweep04.test_sweep04.test_big_run: average,slope=[1484.4422110552764, -14.248078270138572] test_sweep04.test_sweep04.test_big_run: 19900/20000 stressing=False test_sweep04.test_sweep04.test_big_run: DHANDLE STATS: close=9900, remove=20006, sweep=10102, session_handles=195479, session_sweeps=12211, sweep_tod=218423, sweep_ref=1 test_sweep04.test_sweep04.test_big_run: dhandle_count=113 test_sweep04.test_sweep04.test_big_run: file_open=114 test_sweep04.test_sweep04.test_big_run: average,slope=[1477.585, -14.240113252831321] test_sweep04.test_sweep04.test_big_run: 1st qtr: average=2053.6,slope=36.05887154861945 test_sweep04.test_sweep04.test_big_run: 2nd qtr: average=2726.72,slope=4.8072989195678275 test_sweep04.test_sweep04.test_big_run: 3rd qtr: average=1013.46,slope=-48.96888355342137 test_sweep04.test_sweep04.test_big_run: 4th qtr: average=116.56,slope=-0.31951980792316925 test_sweep04.test_sweep04.test_big_run: end run: average=113.2,slope=-0.0481203007518797 test_sweep04.test_sweep04.test_big_run: finishing test_sweep04.test_sweep04.test_big_run: passed=True test_sweep04.test_sweep04.test_big_run: skipped=False test_sweep04.test_sweep04.test_big_run: TEST COMPLETED
The passing test shows the number of dhandles dropping significantly during the test, whereas in the failing test they stay much higher thus failing the thresholds in the test.
Any dhandles that are still open are freed when the connection is closed.
The goal of this ticket is to do two things simultaneously:
- close dropped dhandles as soon as practical, and
- avoid the use-after-free error discussed in
WT-10994.
We know how to do each of those on their own, but doing them together is proving to be a challenge so far.
- duplicates
-
WT-11181 heap-use-after-free during dhandle open
- Closed
- is duplicated by
-
WT-11132 Make drop table close dhandles
- Closed
- is related to
-
WT-11132 Make drop table close dhandles
- Closed
-
WT-10576 Return EBUSY on forced drop if there is an active transaction
- Closed
-
WT-11143 Temporary partial revert to restore Evergreen greenness
- Closed
- related to
-
SERVER-78134 disable concurrency_simultaneous_replication_wiredtiger_cursor_sweeps on macos-arm64 and macOS
- Closed