Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 2.1.4, 2.2.5, 2.3.1, 2.4.0
Affects Version/s: None
Component/s: None
Labels:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Currently hardcoded to 10,000 for MongoDB < 3.2 but for newer versions it can sample the whole collection. For large collections this is slow and inefficient. Allow a limit to be set before sampling the data and make it configurable so users can further reduce the cost of schema inference.

Behavior
$sample uses one of two methods to obtain N random documents, depending on the size of the collection, the size of N, and $sample’s position in the pipeline.

If all the following conditions are met, $sample uses a pseudo-random cursor to select documents:

$sample is the first stage of the pipeline

N is less than 5% of the total documents in the collection

The collection contains more than 100 documents

If any of the above conditions are NOT met, $sample performs a collection scan followed by a random sort to select N documents. In this case, the $sample stage is subject to the sort memory restrictions.

Assignee:: Ross Lawley

Reporter:: Ross Lawley

Reviewers:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: Sep 05 2018 01:16:47 PM UTC

Updated:: Oct 28 2023 10:34:08 AM UTC

Resolved:: Oct 05 2018 03:05:30 PM UTC

Details

Description

Attachments

Activity

People

Dates