-
Type: Technical Debt
-
Resolution: Won't Fix
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
5
-
Storage Engines 2020-02-24
The rec_append_orig_value() function resets the WT_UPDATE.type field of an existing, linked-in update structure from WT_UPDATE_BIRTHMARK to WT_UPDATE_STANDARD.
This is dangerous, because code that does something like this (from wt_txn_read()):
830 *updp = NULL; 831 for (skipped_birthmark = false; upd != NULL; upd = upd->next) { 832 /* Skip reserved place-holders, they're never visible. */ 833 if (upd->type != WT_UPDATE_RESERVE) { 834 upd_visible = __wt_txn_upd_visible_type(session, upd); 835 if (upd_visible == WT_VISIBLE_TRUE) 836 break; 837 if (upd_visible == WT_VISIBLE_PREPARE) 838 return (WT_PREPARE_CONFLICT); 839 } 840 /* An invisible birthmark is equivalent to a tombstone. */ 841 if (upd->type == WT_UPDATE_BIRTHMARK) 842 skipped_birthmark = true; 843 } 844 845 if (upd == NULL && skipped_birthmark) 846 upd = &tombstone; 847 848 *updp = upd == NULL || upd->type == WT_UPDATE_BIRTHMARK ? NULL : upd;
Can get into trouble because if upd.type is re-read from memory at line 848, it can race with reconciliation.
See PR #5173 for more discussion, specifically from alexander.gorrod:
The reason we [reset the update type from WT_UPDATE_BIRTHMARK to WT_UPDATE_STANDARD] that is that we had a lot of trouble ensuring that we were handling tombstones correctly with the lookaside file. I got to the point of enforcing that a page being written to lookaside doesn't have any tombstones left. That decision has had fallout (probably including this), on the other hand, it means we can be confident that the lookaside file doesn't end up with two tombstones for a key - which is bad as it generally means that one of them is pointing at a value that is the wrong version.