Skip to content

RUST-1712 Latency Sees Unbounded Growth When Doing Any Amount Of Concurrent Writes #913

Closed
@LuisOsta

Description

@LuisOsta

Versions/Environment

  1. What version of Rust are you using? 1.66.0
  2. What operating system are you using? MacOS
  3. What versions of the driver and its dependencies are you using? (Run
    cargo pkgid mongodb & cargo pkgid bson) https://github.com/rust-lang/crates.io-index#mongodb@2.6.0 and https://github.com/rust-lang/crates.io-index#bson@2.6.1
  4. What version of MongoDB are you using? (Check with the MongoDB shell using db.version()) 6.0.8
  5. What is your MongoDB topology (standalone, replica set, sharded cluster, serverless)? Replica Set

Describe the bug

When writing concurrently do a MongoDB collection, the full time until execution continues after the insert has been completed grows non-stop linearly. So if you do 10 concurrent writes, whichever one is the first to go will be done in ~60ms but the slowest will take about 480ms. This slowness growths linearly with a 100 concurrent writes the slowest one takes more than 1 second.

This is despite the tracing logs seemingly saying that the command continues to succeed in the appropriate amount of time:

2023-07-19T19:10:08.449191Z DEBUG mongodb::command: Command succeeded topologyId="64b8350f0d64165cf22c9b29" reply="{\"n\":1,\"electionId\":{\"$oid\":\"7fffffff000000000000001d\"},\"opTime\":{\"ts\":{\"$timestamp\":{\"t\":1689793808,\"i\":10}},\"t\":29},\"ok\":1.0,\"$clusterTime\":{\"clusterTime\":{\"$timestamp\":{\"t\":1689793808,\"i\":10}},\"signature\":{\"hash\":{\"$binary\":{\"base64\":\"nLRyMv2XttX7Rkr34LTRkAiMgEg=\",\"subType\":\"00\"}},\"keyId\":7210123958148923394}},\"operationTime\":{\"$timestamp\":{\"t\":1689793808,\"i\":10}}}" commandName="insert" requestId=34 driverConnectionId=4 serverConnectionId=1179 serverHost="ac-qwvoave-shard-00-01.prroxke.mongodb.net" serverPort=27017 durationMS=46

What I would expect is that the difference between running 1 write operation concurrently and running 2 to not be meaningfully different. While I'd expect latency to get worse with a significant amount of concurrent writes, I'd expect that to occur more gradually and under more significant load. And that the time it takes to do the write would be about the same as what the command succeeded tracing logs emits.

We've had this issue for a while but only recently took the time to isolate the issue. I've posted a rust program that replicates the issue with the relevant details below

Things I've tested out:

  • Upgrading the cluster to have more computing. Specifically migrating from a M10 to an M30 on Atlas. No change at all
  • This cluster is used for other things, maybe its some caching issue with other usage. I ran it in an isolated M30 Cluster on Atlas and the issue persists
  • I've tried to change the minimum and maximums pool sizes for the driver. Reducing the maximum further does slow it down but increasing the minimum or the maximum has minimal/no effect.
  • Various BSON-related things: Seems to have no effect at all

Based on the command log, it implies that the driver is getting things back at the time that I'd expect but that's possibly a red herring Apart from some lock-related issue I'm not sure what could be causing this. Do y'all have any idea of what the issue could be

To Reproduce
Use the following program to reproduce the issue and run it with the following command:

RUST_LOG='mongodb::command=debug' cargo run

main.rs

use serde;

#[derive(serde::Serialize, serde::Deserialize, Clone, Debug, PartialEq)]
struct EventDocument {
    #[serde(
        rename = "_id",
        with = "mongodb::bson::serde_helpers::uuid_1_as_binary"
    )] // rename for it to be picked up by the _id index
    pub id: uuid::Uuid, // UUID
    pub owner_id: String,
}

#[tokio::main]
async fn main() {
    tracing_subscriber::fmt::init();
    let db_name = "slow_test";
    let uri = "<mongo_uri>";
    let db = {
        let client_options = mongodb::options::ClientOptions::parse(uri).await.unwrap();
        let client = mongodb::Client::with_options(client_options).unwrap();
        client.database(db_name)
    };

    let collection = db.collection::<EventDocument>("events");
    collection.list_index_names().await.unwrap();

    let total_count = 2;
    let total_instant = tokio::time::Instant::now();
    let mut handles = vec![];
    for i in 0..total_count {
        let collection = collection.clone();
        let handle: tokio::task::JoinHandle<()> = tokio::spawn(async move {
            let instant = tokio::time::Instant::now();
            let event = EventDocument {
                id: uuid::Uuid::new_v4(),
                owner_id: format!("event-{}", i),
            };

            collection.insert_one(event, None).await.unwrap();
            println!("inserted {} in {:?}", i, instant.elapsed());
        });
        handles.push(handle);
    }
    futures::future::join_all(handles).await;
    println!("total time {:?}", total_instant.elapsed());
}

Cargo.toml:

[package]
name = "slow-mongo-test"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
mongodb = { version = "2.6.0", features = ["tracing-unstable", "bson-chrono-0_4", "bson-serde_with", "bson-uuid-1"]}
tokio = { version = "1.29.1", features = ["macros", "rt-multi-thread"] }
serde = { version = "1.0.136", features = ["derive"] }
uuid = "1.4.1"
futures = "0.3.28"
rand = "0.8.5"
tracing = "0.1.37"
tracing-subscriber = "0.3.17"

Metadata

Metadata

Assignees

Labels

tracked-in-jiraTicket filed in Mongo's Jira system

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions