Since the problem is with $lookup, and it is difficult (impossible?) to see the performance of $lookup in explain, we need to create enough records in a collection to see the problem firsthand:
use test;
db.human.drop();
db.human.createIndex({indexField: 1});
for (var i = 1; i <= 200000; i++) { db.human.insert( { indexField: "name"+i } ); }
Now, let's create a primary collection, where a few records have a reference:
db.alien.drop();
db.alien.insert({refId: 'name1'});
db.alien.insert({});
db.alien.insert({refId: 'name2'});
db.alien.insert({});
db.alien.insert({});
db.alien.insert({});
db.alien.insert({});
db.alien.insert({});
db.alien.insert({});
db.alien.insert({});
Now, let's run a $lookup:
db.alien.aggregate([
{$lookup: {
from: 'human',
localField: 'refId',
foreignField: 'indexField',
as: 'human',
}}
])
This lookup appears to run just fine.
By way of information, my understanding is that the lookup happening here is the equivalent of db.human.find({indexField: 'name1'});
Now, let's run a $lookup using pipeline:
db.alien.aggregate([
{$lookup: {
from: 'human',
let: {'refId' : '$refId'},
pipeline: [
{$match: {
$expr: {$eq: ['$indexField', '$$refId']}
}},
],
as: 'human',
}}
])
This lookup runs slow. If you were to put a $match in the first stage and filter records by {refId: {$exists: true}}, it would run fast. Which leads me to believe the problem is with the records for which refId does not exist.
By way of information, my understanding is that the lookup happening here is the equivalent of db.human.find({$expr: {$eq: ['$indexField', 'name1']}});
Interestingly enough, the query
db.human.find({$expr: {$eq: ['$indexField', null]}});
Seems to run just fine, so I'm not sure what's happening in the $lookup to make it run so slowly. Slowly here is obviously relative. I'm on a 2017 high-end macbook pro, and slowly means about 1.5 seconds.