Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.3.1
Affects Version/s: None
Component/s: Internal Code
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Steps To Reproduce:

Hide

https://logkeeper.mongodb.org/lobster/build/963b4308c9524506b0757603f70ff063/test/5d942c83c2ab68380ca0e7a0#bookmarks=0%2C38606&l=1

Show
https://logkeeper.mongodb.org/lobster/build/963b4308c9524506b0757603f70ff063/test/5d942c83c2ab68380ca0e7a0#bookmarks=0%2C38606&l=1
Sprint:
Dev Tools 2019-10-07, Dev Tools 2019-10-21

An upgrade to PseudoRandom was reverted due to a test relying on the specific bits output by PseudoRandom(0). This is not a good situation (~~SERVER-43641~~).

Tests must not hardcode this sort of thing. If we do, we can never make improvements to the generators without updating all such tests.

Regarding db/repl/replication_coordinator_impl_elect_v1_test.cpp:
3 tests from the TakeoverTest/ suite are affected:

CatchupTakeoverCallbackCanceledIfElectionTimeoutRuns
DontCallForPriorityTakeoverWhenLaggedDifferentSecond
DontCallForPriorityTakeoverWhenLaggedSameSecond

These only seem to work reliably when fed a (now legacy) PseudoRandom
initialized with a seed of 0. Otherwise the election timeouts are randomized in such a way that the test doesn't reach the desired state, and it fails.
This is extremely fragile and should be fixed asap.

The ReplCoordinatorImpl takes a seed in its constructor. From this seed it makes a PseudoRandom which it uses to generate electionTimeout intervals. This is very hit-or-miss, and a test would have to hope to find a seed that puts the RS into a desired state, and such a seed, if found, would need to be updated with every little tweak of the random number generator or the interval upperBound, etc. Tests really need to directly control the election timeout durations in order to get the RS into their desired state. So really the ctor should take a Duration generator rather than a seed.

For the moment I'm going to bring the entire PseudoRandom "XorShift" implementation into the test as a generator.

PS: Another way to go here would be to use a FailPoint to inject an electionTimeout result, overriding the randomly generated result.

See also ~~SERVER-43767~~ (related issue in another test)

has to be done before

SERVER-43641 platform/random.h causing bugs, upgrade overdue

Closed

Assignee:: Billy Donahue

Reporter:: Billy Donahue

Participants:: Billy Donahue, Githook User, William Schultz

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: Oct 03 2019 01:11:45 AM UTC

Updated:: Oct 29 2023 10:16:32 PM UTC

Resolved:: Oct 08 2019 09:19:08 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates