Uploaded image for project: 'Rust Driver'
  1. Rust Driver
  2. RUST-592

Race condition in server monitoring

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 1.2.0
    • Affects Version/s: None
    • Component/s: None

      Due to the way the monitors clone the topology, release the read lock, update the topology, and then acquire the write lock and replace it wholesale, there is a race condition that could allow one monitor to overwrite the results of another, keeping a ServerDescription out of date until the next heartbeat.

      Repro:

      #[cfg_attr(feature = "tokio-runtime", tokio::test(threaded_scheduler))]
      #[cfg_attr(feature = "async-std-runtime", async_std::test)]
      #[function_name::named]
      async fn repro() {
          let _guard: RwLockWriteGuard<()> = LOCK.run_exclusively().await;
      
          let client = EventClient::new().await;
          for _ in 0..5 {
              client
                  .database("test")
                  .run_command(doc! { "ping": 1 }, None)
                  .await
                  .unwrap();
          }
      
          let mut tallies: HashMap<StreamAddress, u32> = HashMap::new();
          for event in client.get_command_started_events("find") {
              *tallies.entry(event.connection.address.clone()).or_insert(0) += 1;
          }
      
          assert_eq!(tallies.len(), 2);
      }
      

      Here is some debug output from running this. Note how the first mongos flips back to unknown after the second monitor updates the topology. It remains this way for the whole test and so only one mongos ever gets selected.

      running 1 test
      localhost:27017: performing check
      localhost:27018: performing check
      localhost:27018: check done, updating
      localhost:27017: check done, updating
      got lock, updating state
      pre update servers:
      localhost:27017 Unknown
      localhost:27018 Unknown
      pre update servers:
      localhost:27017 Mongos
      localhost:27018 Unknown
      got lock, updating state
      pre update servers:
      localhost:27017 Mongos
      localhost:27018 Unknown
      pre update servers:
      localhost:27017 Unknown
      localhost:27018 Mongos
      thread 'test::coll::repro' panicked at 'assertion failed: `(left == right)`
        left: `0`,
       right: `2`', src/lib.rs:1:1
      
      

            Assignee:
            patrick.freed@mongodb.com Patrick Freed
            Reporter:
            patrick.freed@mongodb.com Patrick Freed
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: