We use the WT stats to calculate storageSize, but the format changes after it reaches a file size threshold. We need to account for and handle the format change or change it upstream.
Calculation is made here:
https://github.com/mongodb/mongo/blob/master/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp#L226
from:
> db.f.stats().wiredtiger["block manager"] { "file allocation unit size" : "4096", "blocks allocated" : "0", "checkpoint size" : "0", "allocations requiring file extension" : "0", "blocks freed" : "0", "file magic number" : "120897", "file major version number" : "1", "minor version number" : "0", "file bytes available for reuse" : "0", ==> "file size in bytes" : "4096" }
To this:
> db.f.stats().wiredtiger["block manager"] { "file allocation unit size" : "4096", "blocks allocated" : "97", "checkpoint size" : "2M (2322432)", "allocations requiring file extension" : "97", "blocks freed" : "0", "file magic number" : "120897", "file major version number" : "1", "minor version number" : "0", "file bytes available for reuse" : "0", ==> "file size in bytes" : "2M (2330624)" }
As a side note, the human readable stats are also wrong in WT once file size goes to GB. For example:
> db.bulk3.stats().wiredtiger["block manager"] { "file allocation unit size" : "4096", "blocks allocated" : "0", "checkpoint size" : "4B (4217180160)", "allocations requiring file extension" : "0", "blocks freed" : "0", "file magic number" : "120897", "file major version number" : "1", "minor version number" : "0", "file bytes available for reuse" : "20480", ==> "file size in bytes" : "4B (4217196544)" }