Hello,
On startup, mongos runs 1 query per database to the `config.collections` collection.
On cluster with large amount of databases, it will have an important impact on the process startup time.
Moreover, as mongos uses `ReadConcern: majority` to communicate with config replica, the startup time can also vary depending to the network latency between config members and the mongos.
Here are the kind of requests sent by mongos to config RS on startup:
```
{"t":\{"$date":"2024-01-22T13:14:58.987+00:00"},"s":"D4", "c":"ASIO", "id":22596, "ctx":"mongosMain","msg":"startCommand","attr":{"request":"RemoteCommand 939 – target:[cfg-1.internal:37001] db:config expDate:2024-01-22T13:15:28.987+00:00 cmd:{ find: \"collections\", filter:
{ _id: /^db_20240119160851\\-800\\./ }, readConcern: { level: \"majority\", afterOpTime:
{ ts: Timestamp(1705929298, 1), t: 2 }}, maxTimeMS: 30000 }"}}
{"t":\{"$date":"2024-01-22T13:14:58.989+00:00"},"s":"D4", "c":"ASIO", "id":22596, "ctx":"mongosMain","msg":"startCommand","attr":{"request":"RemoteCommand 940 – target:[cfg-1.internal:37001] db:config expDate:2024-01-22T13:15:28.989+00:00 cmd:{ find: \"collections\", filter:
{ _id: /^db_20240119160851\\-801\\./ }, readConcern: { level: \"majority\", afterOpTime:
{ ts: Timestamp(1705929298, 1), t: 2 }}, maxTimeMS: 30000 }"}}
{"t":\{"$date":"2024-01-22T13:14:58.991+00:00"},"s":"D4", "c":"ASIO", "id":22596, "ctx":"mongosMain","msg":"startCommand","attr":{"request":"RemoteCommand 941 – target:[cfg-2.internal:37001] db:config expDate:2024-01-22T13:15:28.991+00:00 cmd:{ find: \"collections\", filter:
{ _id: /^db_20240119160851\\-802\\./ }, readConcern: { level: \"majority\", afterOpTime:
{ ts: Timestamp(1705929298, 1), t: 2 }}, maxTimeMS: 30000 }"}}
{"t":\{"$date":"2024-01-22T13:14:59.004+00:00"},"s":"D4", "c":"ASIO", "id":22596, "ctx":"mongosMain","msg":"startCommand","attr":{"request":"RemoteCommand 942 – target:[cfg-3.internal:37001] db:config expDate:2024-01-22T13:15:29.004+00:00 cmd:{ find: \"collections\", filter:
{ _id: /^db_20240119160851\\-803\\./ }, readConcern: { level: \"majority\", afterOpTime:
{ ts: Timestamp(1705929298, 1), t: 2 }}, maxTimeMS: 30000 }"}}
```
In our case, we are operating a cluster with more than 450k databases. Config servers is distributed between 3 sites:
- Mongos <-> Config1 : 1ms
- Mongos <-> Config2: 3ms
- Mongos <-> Config3: 10ms
When the mongos start, as the queries as distributed equally to all config member, it takes approximately 30 minutes to start, which can be confirmed by doing the maths: `150k * 1ms + 150k * 3ms + 150k * 10ms = 35 minutes`
We tried to set a lower `–localThreshold`, but it apparently only applies to communications between mongos and data shard.
We are able to reproduce this behaviour by creating 25k on a testing env:
- it takes 2 minutes for a mongos to start when all config RS members are distributed between the three sites
- it still takes 25s to start if config RS members are located on the same site
We were able to reproduce it on 5.0.24, but not on 6.0.13.
- 4.4.22: 2 minutes
```
{"t":\{"$date":"2024-01-23T14:18:32.675+00:00"},"s":"I", "c":"CONTROL", "id":23403, "ctx":"mongosMain","msg":"Build Info","attr":{"buildInfo":{"version":"4.4.22","gitVersion":"fc832685b99221cffb1f5bb5a4ff5ad3e1c416b2","openSSLVersion":
"OpenSSL 1.1.1f 31 Mar 2020","modules":[],"allocator":"tcmalloc","environment":{"distmod":"ubuntu2004","distarch":"x86_64","target_arch":"x86_64"}}}}
,"s":"I", "c":"NETWORK", "id":23016, "ctx":"listener","msg":"Waiting for connections","attr":{"port":57000,"ssl":"off"}}
```
- 5.0.24: 2 minutes
```
{"t":\{"$date":"2024-01-23T15:01:52.077+00:00"},"s":"I", "c":"CONTROL", "id":23403, "ctx":"mongosMain","msg":"Build Info","attr":{"buildInfo":{"version":"5.0.24","gitVersion":"f034f0c51b3dffef4b8c9452d77ede9888f28f66","openSSLVersion":
"OpenSSL 1.1.1f 31 Mar 2020","modules":[],"allocator":"tcmalloc","environment":{"distmod":"ubuntu2004","distarch":"x86_64","target_arch":"x86_64"}}}}
,"s":"I", "c":"NETWORK", "id":23016, "ctx":"listener","msg":"Waiting for connections","attr":{"port":57000,"ssl":"off"}}
```
- 6.0.13: 7s
```
{"t":\{"$date":"2024-01-23T15:53:10.296+00:00"},"s":"I", "c":"CONTROL", "id":23403, "ctx":"mongosMain","msg":"Build Info","attr":{"buildInfo":{"version":"6.0.13","gitVersion":"3b13907f9bdf6bd3264d67140d6c215d51bbd20c","openSSLVersion":
"OpenSSL 1.1.1f 31 Mar 2020","modules":[],"allocator":"tcmalloc","environment":{"distmod":"ubuntu2004","distarch":"x86_64","target_arch":"x86_64"}}}}
```
Also, is it possible to backport the behaviour on 6.0.13 to upcoming 4.4 and 5.0 versions?
Thank you!