-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Geo
-
Query Integration
-
QI 2023-09-04, QI 2023-09-18, QI 2023-10-02, QI 2023-10-16, QI 2023-10-30, QI 2023-11-13, QI 2023-11-27, QI 2023-12-11, QI 2023-12-25, QI 2024-01-08, QI 2024-01-22, QI 2024-02-05, QI 2024-02-19, QI 2024-03-04, QI 2024-03-18, QI 2024-04-01, QI 2024-04-15, QI 2024-04-29, QI 2024-05-13
-
(copied to CRM)
Hi, in our production mongod, we find geoNear may get so many cases where users only want
30 records but mongodb scans tens of thounds of records. After diving deep into the code and
a long time debug, we find nothing wrong but mongodb has to scan so many records.
1) In your original implementaion, you use a queue to do quadtree split. But the smallest geoprefix is limited by internalGeoNearQuery2DMaxCoveringCells, so it is easy to get the bad case of getting a very large area can not be splitted.
2) tunning internalGeoNearQuery2DMaxCoveringCells larger will get more IO scans, but we
have buffered data in memory and it does not cost so much. We thought it will help but actually not, after digging further, we found the problem, the heuristic algorithm of deciding the first iterate radius and the growth of the delta radius are too wild in the situation of density area with only tens of data to be returned.
I wrote a blog to explain your work, which can be find here geoNear
The attachment is the executing result of db.coll.find({lag:{'$nearSphere':[120.993965,31.449034 ], '$maxDistance':7.83927971443699e-05}}).limit(30).explain(true)
customer related infomation is replaced by me for security.
I did a tradeoff here, sacrificed a little bit of correctness but got 99% executing time back.
this is the execution result of original implementation
"stats" : { "nscanned" : 24626, "objectsLoaded" : 24605, "avgDistance" : 415.83339730123487, "maxDistance" : 415.83339730123487, "time" : 194 },
this is the one we optimized
"stats" : { "nscanned" : 514, "objectsLoaded" : 502, "avgDistance" : 415.83339730123487, "maxDistance" : 415.83339730123487, "time" : 5 },
Finally, I would like to say, the correctness is not the most important, many lbs services have the need of geoLNear. And the performance in density area is more important and you should focus on.
- is related to
-
SERVER-18426 $geoNear expands aggressively if the centroid is far from the dense data
- Backlog