Context:
A prefix search_near() call is a search_near() call configured with cursor configuration as prefix_key=true. The configuration allows the search_near() to exit early when the search reaches a key that is lexicographically greater than the intended set key.
Prefix search_near() function calls becomes useful when keys are not all visible to the transaction, For example In a table of keys ranging from "aaa" - "zzz" of timestamp 50, and the search_near() of key "aa" operates on a timestamp 25. A normal search_near() function would traverse all keys in the tree, to return WT_NOTFOUND. When prefix_key=true is configured, the search_near() function will only look at key ranges in the tree from "aaa" - "aaz", reducing the number of traversals needed to return back WT_NOTFOUND.
Problem:
This code here shows how WT checks if a key in the tree is lexicographically greater than the intended set key:
/* * If the cursor has prefix search configured we can early exit here if the key that we are * visiting is after our prefix. */ if (F_ISSET(&cbt->iface, WT_CURSTD_PREFIX_SEARCH) && prefix != NULL && __wt_prefix_match(prefix, &cbt->iface.key) < 0) { /* It is not okay for the user to have a custom collator. */ WT_ASSERT(session, CUR2BT(cbt)->collator == NULL); WT_STAT_CONN_DATA_INCR(session, cursor_search_near_prefix_fast_paths); return (WT_NOTFOUND); }
WT uses __wt_prefix_match to check the ordering of a particular key. When working on WT-7924, we discovered a WT_NOTFOUND error happening in our testing, when executing a prefix search_near() call. The symptoms of the original test is described as:
Schema configuration: key_format=u,value_format=u
- Insert keys "aaa" - "aay" with timestamp 200
- Insert key "aaz" with timestamp 50
- With a transaction of read timestamp 100, search_near() for aa
In this test, the behaviour of the search_near() successfully returns back "aaz" as the closest visible value to the transaction. The problem arises when the table configuration of key_format is set to "S" string. for some reason the __wt_prefix_match early exits at "aaa", and returns back WT_NOTFOUND. I have attached a reproducer, which reproduces 100% of the time.