While conducting performance benchmarking of introducing the pre-fetch functionality into WiredTiger, a specific scenario leading to a segmentation fault was encountered. It happens when spawning many threads in the pre-fetch utility (e.g. 16 threads will consistently reproduce the failure), and then doing a verify operation on a database in a slow disk scenario.
A segmentation fault occurs when trying to access the session->pf.prefetch_prev_ref variable in the following check:
if (session->pf.prefetch_prev_ref->page == ref->home && session->pf.prefetch_skipped_with_parent < WT_PREFETCH_QUEUE_PER_TRIGGER)
The strange thing is that directly above this conditional check, we are already checking that session->pf.prefetch_prev_ref is not NULL. Therefore, it seems that either one of the following scenarios is happening:
- A pre-fetch thread is racing with another pre-fetch thread, implying that the multithreading of the pre-fetch utility was not properly implemented.
- We wrongly assume the lifecycle of the session->pf.prefetch_prev_ref variable and try to access it when it is not guaranteed to be valid.