-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Unknown
-
None
-
Affects Version/s: None
-
Component/s: BSON, Performance
-
None
-
Dotnet Drivers
Bson*Reader/Writer uses Bson*Context objects to hold information when switching hierarchies (e.g. reading/writing an embedded document).
These context objects are created each time and are released shortly afterwards.
When multiple documents are read/written (e.g. a batch or documents with many embedded documents) this can result in a large number of allocations. This has been seen in our profiling sessions.
These allocations can be avoided to some extent by reusing/caching the Bson*Context objects.
Current implementation:
Method | BenchmarkData | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
---|---|---|---|---|---|---|---|---|
BsonDecoding | Deep | 133.1 ms | 1.24 ms | 1.16 ms | 22000.0000 | 1772.7273 | 351.49 MB | |
BsonDecoding | Flat | 179.1 ms | 0.96 ms | 0.85 ms | 32882.3529 | 3647.0588 | 525.28 MB | |
BsonDecoding | Full | 166.2 ms | 0.70 ms | 0.62 ms | 23764.7059 | 1588.2353 | 379.87 MB | |
BsonEncoding | Deep | 110.9 ms | 0.47 ms | 0.44 ms | 2703.7037 | 43.56 MB | ||
BsonEncoding | Flat | 109.9 ms | 0.29 ms | 0.26 ms | 1851.8519 | 29.68 MB | ||
BsonEncoding | Full | 117.4 ms | 0.46 ms | 0.41 ms | 1640.0000 | 26.17 MB | ||
MultiFileExport | 565000000 | 8.135 s | 0.0949 s | 0.0147 s | 516000.0000 | 305000.0000 | 3000.0000 | 7.73 GB |
Cached implementation:
Method | BenchmarkData | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
---|---|---|---|---|---|---|---|---|
BsonDecoding | Deep | 130.0 ms | 0.70 ms | 0.66 ms | 19913.0435 | 1521.7391 | 317.84 MB | |
BsonDecoding | Flat | 180.5 ms | 1.21 ms | 1.07 ms | 32875.0000 | 3375.0000 | 524.75 MB | |
BsonDecoding | Full | 165.2 ms | 0.71 ms | 0.63 ms | 22611.1111 | 1611.1111 | 361.18 MB | |
BsonEncoding | Deep | 110.2 ms | 0.56 ms | 0.53 ms | 1384.6154 | 22.28 MB | ||
BsonEncoding | Flat | 112.3 ms | 0.34 ms | 0.30 ms | 1846.1538 | 29.75 MB | ||
BsonEncoding | Full | 115.9 ms | 0.58 ms | 0.48 ms | 884.6154 | 14.19 MB | ||
MultiFileExport | 565000000 | 7.652 s | 0.5873 s | 0.0909 s | 502000.0000 | 288000.0000 | 3000.0000 | 7.44 GB |
The improvement (-48% allocations) is best seen in the BsonEncoding benchmark, because there are fewer other objects that are created.
The execution time doesn't change much for Bson* benchmarks, because the concurrent GC can collect the objects fast enough and there is no cpu limit (one thread runs the benchmark, another thread runs the GC).
When all CPUs are used, the caching shows improved execution times. This can be seen in in the parallel MultiFileExport benchmark even though the benchmark is I/O intensive.