Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Sharding
Labels:
- qexec-team

Sprint:
Query 2020-06-01, Query 2020-06-15
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Suppose you have a 2 shards db and a sharded collection with a shard key 'origin'. 2.6 M documents.

a sh.status() gives:

        {  "_id" : "test",  "primary" : "sh_0",  "partitioned" : true,  "version" : {  "uuid" : UUID("28b86279-c3e8-4432-b325-28136e353a85"),  "lastMod" : 1 } }        {  "_id" : "test",  "primary" : "sh_0",  "partitioned" : true,  "version" : {  "uuid" : UUID("28b86279-c3e8-4432-b325-28136e353a85"),  "lastMod" : 1 } }                

               test.sh_coll
                        shard key: { "origin" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                sh_0 5
                                sh_1 5
                        { "origin" : { "$minKey" : 1 } } -->> { "origin" : "A000001" } on : sh_1 Timestamp(3, 2)
                         { "origin" : "A000001" } -->> { "origin" : "F000001" } on : sh_1 Timestamp(3, 3)
                         { "origin" : "F000001" } -->> { "origin" : "H050002" } on : sh_1 Timestamp(3, 4)
                         { "origin" : "H050002" } -->> { "origin" : "K000003" } on : sh_1 Timestamp(3, 5)
                         { "origin" : "K000003" } -->> { "origin" : "N" } on : sh_1 Timestamp(3, 6)
                         { "origin" : "N" } -->> { "origin" : "P050000" } on : sh_0 Timestamp(3, 7)
                         { "origin" : "P050000" } -->> { "origin" : "S000001" } on : sh_0 Timestamp(3, 8)
                         { "origin" : "S000001" } -->> { "origin" : "U050002" } on : sh_0 Timestamp(3, 9)
                         { "origin" : "U050002" } -->> { "origin" : "Y045375" } on : sh_0 Timestamp(3, 10)
                         { "origin" : "Y045375" } -->> { "origin" : { "$maxKey" : 1 } } on : sh_0 Timestamp(3, 11)

So, sorted by 'origin', half the records are in one shard the the next half is in the other shard.

Now suppose we want to get all the records, sorted by 'origin' and every 1000 records, make a process that takes 1 seconds... Getting half of the records takes more than 10mn (the default timeout for cursors).

If we don't specify a batch_size, we don't even arrive to 1.182 records.

If we specify a batch size of 1000, we get a "RecordNotFound" error when, after having returned half the records + the records from the first get from the 2nd shard, mongos tries to get more data from the 2nd shard.

That's what we wanted to show up because whatever the batch_size you specify, you won't be able to get all the records. Using a small batch size is often described as the way to solve cursors timeout. In a multi sharded db it is not the case.

Of course, this example has been crafted to reproduce the issue. It is not a real use case but I have real ones where I have 10 shards, process of each record takes more than 1 second and cursorTimeoutMillis: "7200000" (2 hours) and still get CursorNotFound errors.

no_cursor_timeout=True is not an option because it can lead to ghost cursors in the db

duplicates

SERVER-6036 Disable cursor timeout for cursors that belong to a session

Closed

Assignee:: Mihai Andrei
Reporter:: Remi Jolin
Participants:: Carl Champain, Dmitry Agranat, Mihai Andrei, Misha Tyulenev, Remi Jolin
Votes:: 0 Vote for this issue
Watchers:: 8 Start watching this issue

Created:: Mar 17 2020 04:28:56 PM UTC
Updated:: Jun 15 2020 01:36:20 PM UTC
Resolved:: Jun 15 2020 12:25:32 PM UTC
Confidence Status Last Update:: 23/Apr/20 5:52 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates