-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Internal Code
-
None
-
Fully Compatible
-
ALL
-
-
Dev Tools 2019-10-07, Dev Tools 2019-10-21
An upgrade to PseudoRandom was reverted due to a test relying on the specific bits output by PseudoRandom(0). This is not a good situation (SERVER-43641).
Tests must not hardcode this sort of thing. If we do, we can never make improvements to the generators without updating all such tests.
Regarding db/repl/replication_coordinator_impl_elect_v1_test.cpp:
3 tests from the TakeoverTest/ suite are affected:
CatchupTakeoverCallbackCanceledIfElectionTimeoutRuns
DontCallForPriorityTakeoverWhenLaggedDifferentSecond
DontCallForPriorityTakeoverWhenLaggedSameSecond
These only seem to work reliably when fed a (now legacy) PseudoRandom
initialized with a seed of 0. Otherwise the election timeouts are randomized in such a way that the test doesn't reach the desired state, and it fails.
This is extremely fragile and should be fixed asap.
The ReplCoordinatorImpl takes a seed in its constructor. From this seed it makes a PseudoRandom which it uses to generate electionTimeout intervals. This is very hit-or-miss, and a test would have to hope to find a seed that puts the RS into a desired state, and such a seed, if found, would need to be updated with every little tweak of the random number generator or the interval upperBound, etc. Tests really need to directly control the election timeout durations in order to get the RS into their desired state. So really the ctor should take a Duration generator rather than a seed.
For the moment I'm going to bring the entire PseudoRandom "XorShift" implementation into the test as a generator.
PS: Another way to go here would be to use a FailPoint to inject an electionTimeout result, overriding the randomly generated result.
See also SERVER-43767 (related issue in another test)
- has to be done before
-
SERVER-43641 platform/random.h causing bugs, upgrade overdue
- Closed