Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-24714

$graphLookup should be able to apply a filter to the documents it traverses

    • Type: Icon: Improvement Improvement
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.3.10
    • Affects Version/s: None
    • Component/s: Aggregation Framework
    • None
    • Fully Compatible
    • Query 16 (06/24/16)

      As designed, $graphLookup is only capable of traversing a graph recursively, that is, by performing repeated queries on the "connectToField" value. However, for a densely connected graph, this will result in an immense result set, often exceeding the maximum memory usage. In many cases, the user would only like to consider nodes that meet some filter during their search. $graphLookup should support an option that accepts a filter in MatchExpression syntax, and is applied at each stage of the breadth-first search.

      For example, consider a set of flight data, where each document represents a flight between two cities:

      { from: 'JFK', to: 'MIA' , airline: 'Delta' }
      { from: 'MIA', to: 'SFO', airline: 'United' }
      { from: 'JFK', to: 'BOS', airline: 'JetBlue' }
      ...
      

      If a user wishes to figure out all cities they can fly to from New York within two stops, but only on United Airlines flights, they are forced to perform a $graphLookup on the flight origin and destination:

      {$graphLookup: {
          from: 'flights',
          startWith: {$literal: 'JFK'},
          connectToField: 'from',
          connectFromFIeld: 'to',
          as: 'destinations'
      }}
      

      And afterwards, $unwind the result set, and then perform a $match with:

      {$match: {'airline': 'United'}}
      

      However, this will traverse the entire set of flights, when really, the user wanted to specify that $graphLookup should only consider nodes that match the aforementioned filter. In this case, the user should be able to specify:

      {$graphLookup: {
          from: 'flights',
          startWith: {$literal: 'JFK'},
          connectToField: 'from',
          connectFromFIeld: 'to',
          restrictSearchWithMatch: {'airline': 'United'},
          as: 'destinations'
      }}
      

      With this syntax, $graphLookup will only consider flights on United, making both the query faster, and obviating the need for a following $unwind and $match.

      It is worth noting that this filter is not precisely equivalent to unwinding the result set and applying a filter to that. Consider a graph composed of two subgraphs, each consisting entirely of United Airlines flights, where the subgraphs are connected only by Delta flights. A $graphLookup of the first form starting anywhere would find all the flights, and the $unwind/$match would reduce it to the United flights. However, with 'restrictSearchWithMatch', the $graphLookup will explore whichever subgraph it begins in, but not the other, since it would need to traverse a Delta edge to get there.

            Assignee:
            benjamin.murphy Benjamin Murphy
            Reporter:
            benjamin.murphy Benjamin Murphy
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: