-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.7.4
-
Component/s: Sharding
-
None
-
ALL
-
Sharding 2019-09-09
Consider a shard key has values that are double precision floats (ie. "numbers" in javascript). If an attempt is made to split at a point that is very close to an existing chunk min/max (ie. a double value which is "adjacent" or nearly so), then metadata corruption occurs. Specifically, at least one chunk will be missing, causing a gap in the chunk ranges and an inability for the config metadata to be loaded for that collection (since the config is invalid). Only certain values cause the problem.
The exact outcome depends on whether the split point is just above or just below the existing chunk endpoint.
- Just larger than existing point: split command fails (ok: 0), the "left" chunk (chunk B) is missing:
existing endpoint new split point ...------------------------|-----|------------------------... Chunk A Chunk Chunk C B (missing)
- Just smaller than existing point: split command succeeds (ok: 1), chunk is split correctly into chunks A and B, but the "subsequent" chunk (chunk C) is missing:
new split point existing endpoint ...------------------------|-----|------------------------... Chunk A Chunk Chunk C B (missing)
The attached reproducer shows some values that work and that don't. It splits at two double precision values in order. Since some combinations of these work and some don't, there must be something specific about the actual values (or the difference between them) which is causing the failure. If the "A then B" case doesn't work, then "B then A" also doesn't work (though with the different symptoms as above).
Here are the results. The expectation is that every test should pass (or at least not cause an invalid config).
test1: *** FAILED ***: 1 then 1.0000000000000002: [ "(second) split not ok", "(second) wrong chunk count", "(second) gaps" ] { "_id" : "test1.test1-field_MinKey", "lastmod" : Timestamp(1, 1), "lastmodEpoch" : ObjectId("53db166ec333c70bae888422"), "ns" : "test1.test1", "min" : { "field" : { "$minKey" : 1 } }, "max" : { "field" : 1 }, "shard" : "shard0000" } { "_id" : "test1.test1-field_1.0", "lastmod" : Timestamp(1, 4), "lastmodEpoch" : ObjectId("53db166ec333c70bae888422"), "ns" : "test1.test1", "min" : { "field" : 1.0000000000000002 }, "max" : { "field" : { "$maxKey" : 1 } }, "shard" : "shard0000" } test2: *** FAILED ***: 1 then 1.0000000000000004: [ "(second) split not ok", "(second) wrong chunk count", "(second) gaps" ] { "_id" : "test2.test2-field_MinKey", "lastmod" : Timestamp(1, 1), "lastmodEpoch" : ObjectId("53db166ec333c70bae888427"), "ns" : "test2.test2", "min" : { "field" : { "$minKey" : 1 } }, "max" : { "field" : 1 }, "shard" : "shard0000" } { "_id" : "test2.test2-field_1.0", "lastmod" : Timestamp(1, 4), "lastmodEpoch" : ObjectId("53db166ec333c70bae888427"), "ns" : "test2.test2", "min" : { "field" : 1.0000000000000004 }, "max" : { "field" : { "$maxKey" : 1 } }, "shard" : "shard0000" } test3: passed: 1 then 1.0000000000000007 test4: passed: 1 then 1.0000000000000009 test5: *** FAILED ***: 1.0000000000000002 then 1.0000000000000004: [ "(second) split not ok", "(second) wrong chunk count", "(second) gaps" ] { "_id" : "test5.test5-field_MinKey", "lastmod" : Timestamp(1, 1), "lastmodEpoch" : ObjectId("53db166fc333c70bae888436"), "ns" : "test5.test5", "min" : { "field" : { "$minKey" : 1 } }, "max" : { "field" : 1.0000000000000002 }, "shard" : "shard0000" } { "_id" : "test5.test5-field_1.0", "lastmod" : Timestamp(1, 4), "lastmodEpoch" : ObjectId("53db166fc333c70bae888436"), "ns" : "test5.test5", "min" : { "field" : 1.0000000000000004 }, "max" : { "field" : { "$maxKey" : 1 } }, "shard" : "shard0000" } test6: passed: 1.0000000000000002 then 1.0000000000000007 test7: passed: 1.0000000000000002 then 1.0000000000000009 test8: passed: 1.0000000000000004 then 1.0000000000000007 test9: passed: 1.0000000000000004 then 1.0000000000000009 test10: *** FAILED ***: 1.0000000000000007 then 1.0000000000000009: [ "(second) split not ok", "(second) wrong chunk count", "(second) gaps" ] { "_id" : "test10.test10-field_MinKey", "lastmod" : Timestamp(1, 1), "lastmodEpoch" : ObjectId("53db166fc333c70bae88844f"), "ns" : "test10.test10", "min" : { "field" : { "$minKey" : 1 } }, "max" : { "field" : 1.0000000000000007 }, "shard" : "shard0000" } { "_id" : "test10.test10-field_1.000000000000001", "lastmod" : Timestamp(1, 4), "lastmodEpoch" : ObjectId("53db166fc333c70bae88844f"), "ns" : "test10.test10", "min" : { "field" : 1.0000000000000009 }, "max" : { "field" : { "$maxKey" : 1 } }, "shard" : "shard0000" } test11: *** FAILED ***: 1.0000000000000002 then 1: [ "(second) wrong chunk count", "(second) bad min/max chunk" ] { "_id" : "test11.test11-field_MinKey", "lastmod" : Timestamp(1, 3), "lastmodEpoch" : ObjectId("53db166fc333c70bae888454"), "ns" : "test11.test11", "min" : { "field" : { "$minKey" : 1 } }, "max" : { "field" : 1 }, "shard" : "shard0000" } { "_id" : "test11.test11-field_1.0", "lastmod" : Timestamp(1, 4), "lastmodEpoch" : ObjectId("53db166fc333c70bae888454"), "ns" : "test11.test11", "min" : { "field" : 1 }, "max" : { "field" : 1.0000000000000002 }, "shard" : "shard0000" } test12: *** FAILED ***: 1.0000000000000004 then 1: [ "(second) wrong chunk count", "(second) bad min/max chunk" ] { "_id" : "test12.test12-field_MinKey", "lastmod" : Timestamp(1, 3), "lastmodEpoch" : ObjectId("53db1670c333c70bae888459"), "ns" : "test12.test12", "min" : { "field" : { "$minKey" : 1 } }, "max" : { "field" : 1 }, "shard" : "shard0000" } { "_id" : "test12.test12-field_1.0", "lastmod" : Timestamp(1, 4), "lastmodEpoch" : ObjectId("53db1670c333c70bae888459"), "ns" : "test12.test12", "min" : { "field" : 1 }, "max" : { "field" : 1.0000000000000004 }, "shard" : "shard0000" } test13: passed: 1.0000000000000007 then 1 test14: passed: 1.0000000000000009 then 1 test15: *** FAILED ***: 1.0000000000000004 then 1.0000000000000002: [ "(second) wrong chunk count", "(second) bad min/max chunk" ] { "_id" : "test15.test15-field_MinKey", "lastmod" : Timestamp(1, 3), "lastmodEpoch" : ObjectId("53db1670c333c70bae888468"), "ns" : "test15.test15", "min" : { "field" : { "$minKey" : 1 } }, "max" : { "field" : 1.0000000000000002 }, "shard" : "shard0000" } { "_id" : "test15.test15-field_1.0", "lastmod" : Timestamp(1, 4), "lastmodEpoch" : ObjectId("53db1670c333c70bae888468"), "ns" : "test15.test15", "min" : { "field" : 1.0000000000000002 }, "max" : { "field" : 1.0000000000000004 }, "shard" : "shard0000" } test16: passed: 1.0000000000000007 then 1.0000000000000002 test17: passed: 1.0000000000000009 then 1.0000000000000002 test18: passed: 1.0000000000000007 then 1.0000000000000004 test19: passed: 1.0000000000000009 then 1.0000000000000004 test20: *** FAILED ***: 1.0000000000000009 then 1.0000000000000007: [ "(second) wrong chunk count", "(second) bad min/max chunk" ] { "_id" : "test20.test20-field_MinKey", "lastmod" : Timestamp(1, 3), "lastmodEpoch" : ObjectId("53db1671c333c70bae888481"), "ns" : "test20.test20", "min" : { "field" : { "$minKey" : 1 } }, "max" : { "field" : 1.0000000000000007 }, "shard" : "shard0000" } { "_id" : "test20.test20-field_1.000000000000001", "lastmod" : Timestamp(1, 4), "lastmodEpoch" : ObjectId("53db1671c333c70bae888481"), "ns" : "test20.test20", "min" : { "field" : 1.0000000000000007 }, "max" : { "field" : 1.0000000000000009 }, "shard" : "shard0000" } test21: *** FAILED ***: -4204176258327475000 then -4204176258327474700: [ "(second) split not ok", "(second) wrong chunk count", "(second) gaps" ] { "_id" : "test21.test21-field_MinKey", "lastmod" : Timestamp(1, 1), "lastmodEpoch" : ObjectId("53db1671c333c70bae888486"), "ns" : "test21.test21", "min" : { "field" : { "$minKey" : 1 } }, "max" : { "field" : -4204176258327475000 }, "shard" : "shard0000" } { "_id" : "test21.test21-field_-4.204176258327475e+18", "lastmod" : Timestamp(1, 4), "lastmodEpoch" : ObjectId("53db1671c333c70bae888486"), "ns" : "test21.test21", "min" : { "field" : -4204176258327474700 }, "max" : { "field" : { "$maxKey" : 1 } }, "shard" : "shard0000" } test22: *** FAILED ***: -4204176258327474700 then -4204176258327475000: [ "(second) wrong chunk count", "(second) bad min/max chunk" ] { "_id" : "test22.test22-field_MinKey", "lastmod" : Timestamp(1, 3), "lastmodEpoch" : ObjectId("53db1671c333c70bae88848b"), "ns" : "test22.test22", "min" : { "field" : { "$minKey" : 1 } }, "max" : { "field" : -4204176258327475000 }, "shard" : "shard0000" } { "_id" : "test22.test22-field_-4.204176258327475e+18", "lastmod" : Timestamp(1, 4), "lastmodEpoch" : ObjectId("53db1671c333c70bae88848b"), "ns" : "test22.test22", "min" : { "field" : -4204176258327475000 }, "max" : { "field" : -4204176258327474700 }, "shard" : "shard0000" }
The test case values are:
double | hex | decimal |
---|---|---|
1.0000000000000000 | 0x3ff0000000000000 | 4607182418800017408 |
1.0000000000000002 | 0x3ff0000000000001 | 4607182418800017409 |
1.0000000000000004 | 0x3ff0000000000002 | 4607182418800017410 |
1.0000000000000007 | 0x3ff0000000000003 | 4607182418800017411 |
1.0000000000000009 | 0x3ff0000000000004 | 4607182418800017412 |
- depends on
-
SERVER-8829 String representation for chunk id is not unique
- Closed
-
SERVER-42106 Use auto-generated _ids for config.chunks and config.tags
- Closed
- is related to
-
SERVER-14761 split command should only allow NumberLongs for hashed shard keys
- Closed
- related to
-
SERVER-9931 hashed shard keys do not appear to handle decimal values
- Closed