-
Type: New Feature
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Index Maintenance, Sharding
-
None
-
Cluster Scalability
If you have a collection containing "foo" and "bar", it could be the case that you need to make queries against both "foo" and "bar". You could configure a sharded collection to range shard over "foo", which would make any queries against "foo" only hit 1 replset shard, but any query against "bar" would require fanning out to every replset, which can adversely impact load/latency/availability if you have many replset shards.
An approach to solving this would be to have global secondary indexes (GSI; c.f. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html), where you store the keys you want to index ("foo" and "bar" here), along with the underlying _id of the document, or possibly the entire document. These collections would be range sharded (by "foo"+"bar"), so that a query to a GSI would tend to only hit 1 replset initially. If you chose to store the entire document in the GSI, you'd be done. If you instead stored just _id, you'd need to query the underlying collection. If you end up returning 100's of documents you probably end up hitting all replset shards as before, but for the nReturned=0 or nReturned=1 case you'd only hit a single additional replset.
Initially, you'd probably want GSIs to be eventually consistent, though with MongoDB's new transaction support you could conceivably make them more strongly consistent.