-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
(copied to CRM)
-
8
-
Storage Engines - 2023-01-10, StorEng - 2023-01-24, StorEng - 2023-02-07, 2023-02-23 "Stoney Baloney", 2023-03-21 Ellen Ripley, 2023-04-04 Bibbidi-Bobbidi-Boo
-
v6.3, v6.2, v6.0, v5.0, v4.4
Problem description:
While working on a customer issue, I wrote a small reproducer to find out the performance of search_near if many deleted items were present. I noticed that the performance was extremely slow.
Reproducer:
class test_practice(wttest.WiredTigerTestCase): uri = "table:test_reposition01" format_values = [ ('row_integer', dict(key_format='i', value_format='S')), ] conn_config = "cache_size=1GB,statistics=(all),statistics_log=(wait=1,json=true,on_close=true)" scenarios = make_scenarios(format_values) def test_practice(self): format = 'key_format={},value_format={}'.format(self.key_format, self.value_format) self.session.create(self.uri, format) session1 = self.conn.open_session() cur1 = session1.open_cursor(self.uri) self.conn.set_timestamp('oldest_timestamp=' + self.timestamp_str(1)) nrows = 1_000_000 rows = 200_000 for i in range(1, nrows): cur1[i] = 'aaaa' cur1.reset() self.prout("A") for i in range(1, nrows): self.session.begin_transaction() num = random.randrange(1, nrows) cur1.set_key(num) cur1.search_near() cur1.remove() self.session.commit_transaction('commit_timestamp=' + self.timestamp_str(2)) self.prout("B") sleep(5) for i in range(1, rows): # num = random.randrange(1, nrows) cur1[i] = 'aaaa' cur1.reset() self.prout("C") sleep(5) self.conn.set_timestamp('oldest_timestamp=' + self.timestamp_str(3)) t0 = time.time() for i in range(1, rows): self.session.begin_transaction() num = random.randrange(1, nrows) cur1.set_key(num) cur1.search_near() cur1.set_value("213") cur1.update() if i % 1000 == 0: self.prout("H") self.session.commit_transaction('commit_timestamp=' + self.timestamp_str(4)) t1 = time.time() total = t1-t0 self.prout(str(total))
output:
ubuntu@ip-10-122-7-59:~/wiredtiger$ ./run_python_test.sh test_practice [pid:31042]: A [pid:31042]: B [pid:31042]: C [pid:31042]: H [pid:31042]: H [pid:31042]: H [pid:31042]: H [pid:31042]: H [pid:31042]: H . . . [pid:31042]: 1179.7585327625275 . ---------------------------------------------------------------------- Ran 1 test in 1243.290sOK
It took nearly 1179 seconds or 20 minutes for the last operation to finish.
Note:
Reducing the cache size to 100MB improves the performance drastically i.e. from 20 minutes to 40 seconds
- is related to
-
SERVER-48522 Regression after mass deletion
- Closed
-
WT-10807 Skip in-memory deleted pages as part of the tree walk
- Closed