Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Done
Priority: Major - P3
Fix Version/s: 3.4.0
Affects Version/s: None
Component/s: Builders
Labels:
None

Epic Link:
MongoDB 3.4 for Java driver
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Syntax

{
  $bucketAuto: {
    groupBy: <arbitrary expression>,
    buckets: <number which is representable as 32-bit integer>,
    output: {  // Optional, defaults to {count: {$sum: 1}}
      fieldName1: <accumulator 1>,
      fieldName2: <accumulator 2>
    },
    granularity: <optional string - granularity spec>
  }
}

Supported Granularity Specs (based off of the wiki page on preferred numbers):

"R5"
"R10"
"R20"
"R40"
"R80"
"1-2-5"
"E6"
"E12"
"E24"
"E48"
"E96"
"E192"
"POWERSOF2" (1, 2, 4, 8, …)

Examples

//
// Example 1 - all defaults.
//

> db.example.insert([
  {_id: 0, price: 34.4},
  {_id: 1, price: 20},
  {_id: 2, price: 0.45},
  {_id: 3, price: 999.99},
  ...
])
> db.example.aggregate([{
  $bucketAuto: {
    groupBy: "$price",
    buckets: 4
  }
}])
// Output - note the weird ranges.
{_id: {min: 0.01, max: 2.49}, count: 43}  // 0.01 was the lowest price in the catalog.
{_id: {min: 2.49, max: 10.45}, count: 43}
{_id: {min: 10.45, max: 54.99}, count: 43}
{_id: {min: 54.99, max: 5499.99}, count: 43}  // 5499.99 was the highest price.

//
// Example 2 - custom accumulators.
//

// Same data.
> db.example.aggregate([{
  $bucketAuto: {
    groupBy: "$price",
    buckets: 4,
    output: {
      count: {$sum: 1},
      avgPrice: {$avg: "$price"},
    }
  }
}])
// Output - still has the weird ranges.
{_id: {min: 0.01, max: 2.49}, count: 43, avgPrice: 1.55}
{_id: {min: 2.49, max: 10.45}, count: 43, avgPrice: 5.43}
{_id: {min: 10.45, max: 54.99}, count: 43, avgPrice: 20.44}
{_id: {min: 54.99, max: 5499.99}, count: 43, avgPrice: 78.97}


//
// Example 3 - custom granularity.
//

// Same data.
> db.example.aggregate([{
  $bucketAuto: {
    groupBy: "$price",
    buckets: 4,
    output: {
      count: {$sum: 1},
      avgPrice: {$avg: "$price"},
    },
    granularity: "R5"
  }
}])
// Output
{_id: {min: 0.01, max: 2.5}, count: 43, avgPrice: 1.55}
{_id: {min: 2.5, max: 10}, count: 41, avgPrice: 5.46}
{_id: {min: 10, max: 63}, count: 48, avgPrice: 20.24}
{_id: {min: 63, max: 5499.99}, count: 41, avgPrice: 79.97}

Behavior

This is a blocking stage.
The granularity will be used as follows:
- Once the unrounded boundary values have been chosen, all values except the first and last will be replaced with their closest value from the specified series:
- Example:
  0.01, 2.49, 10.45, 54.99, 5499.99 above will be replaced with
  0.01, 2.50, 10.00, 63.00, 5499.99 for "R5", or
  0.01, 2.00, 10.00, 50.00, 5499.99 for "1-2-5"
  (extra digits added to conserve alignment).
Will dynamically compute buckets to give you the desired number of buckets, each with an approximately equal number of documents in each. For example, if you asked for 2 buckets, you would get buckets representing [0th percentile, 50th percentile), [50th percentile, 100th percentile]
[Edge cases] (This is a draft, more research needs to be done)
- In certain cases, there will not be as many buckets as requested.
- In certain cases, the width of each bucket will not be uniform.
- In certain cases, the 'height' (number of documents placed in each bucket) will not be uniform.
- Take the following examples:
  - When requesting 2 buckets, with values {1,1,1,1,2}, you will get 2 buckets:
    {min: MinKey, max: 2} // Includes only 1's - 4 total. {min: 2, max: MaxKey} // Includes only 2's - 1 total.
  - When requesting 3 buckets, with values {1,2,2,2,2}, you will get 2 buckets:
    {min: MinKey, max: 2} // Includes only 1's - 1 total. {min: 2, max: MaxKey} // Includes only 2's - 4 total.
  - When requesting 2 buckets, with values {0,1,2,3,4,5,5,5,5} (9 total), you will get 2 buckets:
    {min: MinKey, max: 5} // 5 total. {min: 5, max: MaxKey} // 4 total.
  - When requesting 3 buckets, with values {0,1,2,3,4,5,6,7} (8 total), you will get 3 buckets:
    {min: MinKey, max: 3} // 3 total. {min: 3, max: 6} // 3 total. {min: 6, max: MaxKey} // 2 total.
  - When requesting 3 buckets, with values {0,1,2,2,2,2,2,3} (8 total), you will get 3 buckets:
    {min: MinKey, max: 2} // 2 total. {min: 2, max: 3} // 5 total. {min: 3, max: MaxKey} // 1 total.
  - When requesting 3 buckets, with values {0,1,2,2,2,3,3,3} (8 total), you will get 3 buckets:
    {min: MinKey, max: 2} // 2 total. {min: 2, max: 3} // 3 total. {min: 3, max: MaxKey} // 3 total.
- Guess at a reasonable implementation: The bounds of each bucket are computed by first computing the average bucket size, then filling buckets left to right (smallest 'min' to largest), placing values into the current bucket until the current bucket size is greater than or equal to the average bucket size, or until the next unique value has at least as many copies as the average bucket size.
  More research needs to be done here.

is depended on by

DRIVERS-297 Aggregation Framework Support for 3.4

Closed

Assignee:: Justin Lee
Reporter:: Rathi Gnanasekaran (Inactive)
Reviewers:: None
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Aug 09 2016 08:02:42 PM UTC
Updated:: Oct 19 2016 02:17:42 PM UTC
Resolved:: Sep 13 2016 03:23:16 PM UTC
Confidence Status Last Update:: 08/Sep/16 7:07 PM

Details

Description

Syntax

Supported Granularity Specs (based off of the wiki page on preferred numbers):

Examples

Behavior

Attachments

Issue Links

Activity

People

Dates