Keep "milestones" against the oplog to efficiently remove the old records using WT_SESSION::truncate() when the collection grows beyond its desired maximum size. AKA oplog stones.
The stones represent logical markers against the oplog that are used as truncation points. When a record is inserted, its size is added to the stone being filled. If the size of the stone exceeds the threshold, then a new stone is cut. If the number of stones exceeds its threshold (between 10 and 100), then the background thread for the oplog is signaled to delete the records represented by the oldest stone. The thresholds are determined based on the size of the oplog.
The stones are not persisted, so new stones are chosen at startup based on the records in the oplog. For small-sized oplogs or those not containing many records, the entire oplog is scanned to compute the stones to use. This is done simply by packing records into the stone until the threshold is exceeded.
For larger oplogs or those with many records (>20,000), records are oversampled (by a factor of 10) from the oplog at random using a WiredTigerRecordStore::RandomCursor. Samples are then chosen such that they are expected to be near the right boundary of the logical section. As the oplog is truncated, the error in this estimation is reduced because the actual size of newly created stones is known with greater certainty.
Changing the size of a record in the live oplog is no longer supported.
- is duplicated by
-
SERVER-17033 Improve performance for bulk insert into WT with under oplog back pressure
- Closed
- related to
-
SERVER-20529 WiredTiger allows capped collection objects to grow
- Closed
-
SERVER-20738 Oplog stones does not enforce ascending order of RecordIds
- Closed
-
SERVER-55821 remove next_random_sample_size=1000 configuration in the oplog sampling code
- Closed