-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Query Optimization
We introduced the query_exec library here. Since then, the library has increased in the size of the number of sources compiled from 56 to 119 and the number of dependent targets has doubled from 34 to 71.
I imagine that much of this growth has come from convenience, given that defining a new library and tracking down dependencies is non-trivial and it is much easier to stick query related code into an existing library. It would be nice to take a thoughtful look at this library and pick apart distinct logical parts.
For example, the distinct parts could be:
- Query execution machinery - all PlanStage implementations
- Cursor management
- Classic runtime planner
- Classic runtime planner for SBE
- Classic stage builders
- SBE stage builders
- Explain
The benefits of splitting this library are:
- Improve the modularity of the query codebase by separating different logical/conceptual units into smaller distinct libraries and clarify their true dependencies
- Allow us to move the target definition into the directory where the code lives, this will enable code owners to cover build files
- Once we move to Bazel, smaller targets can enable faster build times by allowing more opportunities for parallelism. For example, for targets which currently depend on query_exec, if we can trim their dependencies to a subset of query_exec, they don't need to wait for all of query_exec to link before they could link themselves.
- Faster incremental build and test times in Bazel because by having a more granular build graph, we canavoid doing work if it can statically determine that the binary won't change