Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Index Maintenance, Sharding
Labels:
None

Assigned Teams:

Cluster Scalability

If you have a collection containing "foo" and "bar", it could be the case that you need to make queries against both "foo" and "bar". You could configure a sharded collection to range shard over "foo", which would make any queries against "foo" only hit 1 replset shard, but any query against "bar" would require fanning out to every replset, which can adversely impact load/latency/availability if you have many replset shards.

An approach to solving this would be to have global secondary indexes (GSI; c.f. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html), where you store the keys you want to index ("foo" and "bar" here), along with the underlying _id of the document, or possibly the entire document. These collections would be range sharded (by "foo"+"bar"), so that a query to a GSI would tend to only hit 1 replset initially. If you chose to store the entire document in the GSI, you'd be done. If you instead stored just _id, you'd need to query the underlying collection. If you end up returning 100's of documents you probably end up hitting all replset shards as before, but for the nReturned=0 or nReturned=1 case you'd only hit a single additional replset.

Initially, you'd probably want GSIs to be eventually consistent, though with MongoDB's new transaction support you could conceivably make them more strongly consistent.

Assignee:: [DO NOT USE] Backlog - Cluster Scalability

Reporter:: David Bartley

Participants:: [DO NOT USE] Backlog - Cluster Scalability, Andy Schwerin, David Bartley, Ramon Fernandez Marina

Votes:: 1 Vote for this issue

Watchers:: 12 Start watching this issue

Created:: Mar 01 2018 08:32:07 PM UTC

Updated:: Dec 12 2023 03:58:42 PM UTC

Details

Description

Attachments

Activity

People

Dates