There is a 15-35% regression in $lookup and $graphLookup genny workloads running with unsharded collections that was first seen between v5.0 and v5.1. The workloads in question are here: $lookup and $graphLookup. The regression can be seen in the linked BF of when looking at the sys-perf waterfall for v5.0 and v5.1 (select average latency for "RunGraphLookups.GraphLookupUnshardedToUnshardedOneToMany", for example).
Some ideas have been proposed as to why the regression occurred and what can be done to address it. For example, it may have something to do with slow collection scans (the workloads in question use small collections). It may be that the plan cache project, particularly SERVER-61421, could improve the performance of these workloads, since the subpipelines run by these $lookups and $graphLookups are all simple match queries with the same shape.
A more detailed write-up can be found in the comments.