-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
5
-
BermudaTriangle- 2023-09-05
-
v7.0
Here is the code we resolve the prepared state for truncate. We don't set the prepare state first to locked as what we do to resolve a normal prepared update.
/* * Timestamps and prepare state are in the page deleted structure for truncates, or in the * updates list in the case of instantiated pages. We also need to update any page deleted * structure in the ref. * * Only two cases are possible. First: the state is WT_REF_DELETED. In this case page_del cannot * be NULL yet because an uncommitted operation cannot have reached global visibility. (Or at * least, global visibility in the sense we need to use it for truncations, in which prepared * and uncommitted transactions are not visible.) * * Otherwise: there is an uncommitted delete operation we're handling, so the page must have * been deleted at some point, and the tree can't be readonly. Therefore the page must have been * instantiated, the state must be WT_REF_MEM, and there should be an update list in * mod->inst_updates. (But just in case, allow the update list to be null.) There might be a * non-null page_del structure to update, depending on whether the page has been reconciled * since it was deleted and then instantiated. */ if (previous_state != WT_REF_DELETED) { WT_ASSERT(session, previous_state == WT_REF_MEM); WT_ASSERT(session, ref->page != NULL && ref->page->modify != NULL); if ((updp = ref->page->modify->inst_updates) != NULL) for (; *updp != NULL; ++updp) { (*updp)->start_ts = ts; /* * Holding the ref locked means we have exclusive access, so if we are committing we * don't need to use the prepare locked transition state. */ (*updp)->prepare_state = prepare_state; if (commit) (*updp)->durable_ts = txn->durable_timestamp; } } page_del = ref->page_del; if (page_del != NULL) { page_del->timestamp = ts; if (commit) page_del->durable_timestamp = txn->durable_timestamp; WT_PUBLISH(page_del->prepare_state, prepare_state); }
This can either cause the timestamp writes be reordered against the prepared state leading to the reader to see inconsistent state. Even if the writes are not reordered, the reader may still see partially written states such as an update with a prepared timestamp but not in prepared state.
We need to fix this to use the same technique we use to resolve the normal prepared update.
Note that the comment `Holding the ref locked means we have exclusive access, so if we are committing we don't need to use the prepare locked transition state.` is wrong. Holding the ref lock doesn't give you exclusive access. It blocks eviction but doesn't block the readers.