-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Sharding NYC
-
ALL
While investigating https://jira.mongodb.org/browse/PERF-3818 have identified a regression under mongo::OpObserverRegistry::onUpdate (mongo::OpObserverRegistry::onUpdate: 10.17% in 7.0 but 5.11% in 6.0 while 6.0 containing fewer samples). Partly it's due to adding a new observer (QueryAnalysisOpObserver), partly because more work done in ClusterServerParameterOpObserver and OpObserverImpl, and partly due to an increased overhead across all observers.
Should look into:
- what is the shared overhead? (could it be due to replacing cached nss with virtual coll->ns() call?)
- could any of the observers be trimmed or skipped in a more efficient way for updateMany?
I've added local std::chrono-based logging in the loop of OpObserverRegistry::onUpdate and saw the following:
v6.0 | Observer | v7.0 | Observer | Comment |
100.64 | AuthOpObserver | 173.72 | AuthOpObserver | Calls 'AuthorizationManagerImpl::logOp' and 'audit::logUpdateOperation' |
53.1 | ClusterServerParameterOpObserver | 105.66 | ClusterServerParameterOpObserver | Moved updateParameter to recoveryUnit |
53.86 | FcvOpObserver | 64.9 | FcvOpObserver | No changes between releases. Does little. |
447.16 | OpObserverImpl | 618.46 | OpObserverImpl | Multiple changes |
55.2 | PrimaryOnlyServiceOpObserver | 47.24 | PrimaryOnlyServiceOpObserver | Does nothing on update in both releases |
56.4 | ShardSplitDonorOpObserver | 111.78 | QueryAnalysisOpObserver | New in 7.0. The design seems wrong for updateMulti |
59.38 | TenantMigrationDonorOpObserver | 70.24 | TenantMigrationDonorOpObserver | Checks whether nss is the one it cares about and does nothing if not |
58.66 | TenantMigrationRecipientOpObserver | 56.66 | TenantMigrationRecipientOpObserver | Checks whether nss is the one it cares about and does nothing if not |
93.78 | UserWriteBlockModeOpObserver | 79.92 | UserWriteBlockModeOpObserver | Calls '_checkWriteAllowed()' then checks whether nss is the one it cases about and does nothing if not |
978.18 | 1328.58 |
The numbers are nanoseconds per updated doc, averaged across updateMany that touched 50K docs. The last row is the sum across all observers. In my local tests the sum for 6.0 fluctuated between 900 and 1000. Notice, that even the observers that haven't changed and do very little in the scenario of this benchmark (essentially, a guard check or two before bailing out), such as TenantMigrationDonorOpObserver still are generally a little slower in 7.0. More contention on the instruction cache, etc.?
- depends on
-
SERVER-77758 Break down query analysis op observer depending on cluster role.
- Closed
-
SERVER-77364 Speed up OpObservers with a filter framework in OpObserverRegistry
- Closed
- related to
-
SERVER-77373 Use OpStateAccumulator's to cache common state
- Backlog
-
SERVER-77364 Speed up OpObservers with a filter framework in OpObserverRegistry
- Closed