-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Not Applicable
-
None
-
Storage Engines
(Component = session)
WT-13310 splits RNG (random number generators) into one used for skiplists, and one used for other things. Each session has one of each kind of RNG. The two kinds each have advantages and weaknesses.
The skiplist RNG uses the same specific seed for each session. The seed was chosen (I believe) because it gives a long period, that is, number of results before repeating. And when a RNG repeats, it gives the same N values over and over, like in a circle (More on this later). Choosing a seed by other means might get an RNG with a smaller period, which tend to make it "less random". The disadvantage of using the same seed is that each sessions RNG has the same behavior. Session 1's RNG returns r1,r2,r3,.... and Session N's RNG returns r1,r2,r3... So if multiple sessions are just starting out and happen to be concurrently operating on the same data structure, you might see some "synchronized behavior" when you would expect more randomness. It's also possible for two such RNGs to get in sync after being out of sync. It's hard to know how much of a problem this is with skiplists.
The other RNG in the session, on the other hand, doesn't try at all to get seeds with a long period. It looks like its big initial goal is to make sure all sessions (and all subsequent runs of WT) all get different seeds. So it takes session id, process id, and time to the nanosecond and composes them together and then xor's this with the "good seed". But xor-ing some number with a good seed doesn't improve the changes of getting a long period. It does however, give us "pretty unique" seeds (I say "pretty unique" because it's not completely unique - I can carefully construct combinations of process id, time and session id that would give the same seed).
I'll propose a variation in the comments that might be worthy of consideration.
- related to
-
WT-13310 WT random cursor continues to return duplicate records due to poor interaction with MongoDB layer's query yielding
-
- Closed
-