-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
Due to the way the monitors clone the topology, release the read lock, update the topology, and then acquire the write lock and replace it wholesale, there is a race condition that could allow one monitor to overwrite the results of another, keeping a ServerDescription out of date until the next heartbeat.
Repro:
#[cfg_attr(feature = "tokio-runtime", tokio::test(threaded_scheduler))] #[cfg_attr(feature = "async-std-runtime", async_std::test)] #[function_name::named] async fn repro() { let _guard: RwLockWriteGuard<()> = LOCK.run_exclusively().await; let client = EventClient::new().await; for _ in 0..5 { client .database("test") .run_command(doc! { "ping": 1 }, None) .await .unwrap(); } let mut tallies: HashMap<StreamAddress, u32> = HashMap::new(); for event in client.get_command_started_events("find") { *tallies.entry(event.connection.address.clone()).or_insert(0) += 1; } assert_eq!(tallies.len(), 2); }
Here is some debug output from running this. Note how the first mongos flips back to unknown after the second monitor updates the topology. It remains this way for the whole test and so only one mongos ever gets selected.
running 1 test
localhost:27017: performing check
localhost:27018: performing check
localhost:27018: check done, updating
localhost:27017: check done, updating
got lock, updating state
pre update servers:
localhost:27017 Unknown
localhost:27018 Unknown
pre update servers:
localhost:27017 Mongos
localhost:27018 Unknown
got lock, updating state
pre update servers:
localhost:27017 Mongos
localhost:27018 Unknown
pre update servers:
localhost:27017 Unknown
localhost:27018 Mongos
thread 'test::coll::repro' panicked at 'assertion failed: `(left == right)`
left: `0`,
right: `2`', src/lib.rs:1:1