-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 4.4.5
-
Component/s: None
-
ALL
I have a Sharded Cluster and the Balancer seems to hang, I have several unbalanced collections:
db.getSiblingDB("config").chunks.aggregate([ { $match: { ns: { $nin: ["config.system.sessions"] } } }, { $group: { _id: { shard: "$shard", ns: "$ns" }, chunks: { $sum: 1 } } }, { $group: { _id: "$_id.ns", data: { $push: { k: "$_id.shard", v: "$chunks" } } } }, { $replaceRoot: { newRoot: { $mergeObjects: [{ $arrayToObject: "$data" }, { ns: "$_id" }] } } } { $sort: { ns: 1 } } ]) { "shard_03" : 1794, "shard_02" : 1794, "shard_01" : 1794, "shard_04" : 1794, "ns" : "data.sessions.20210606" } { "shard_03" : 1509, "shard_04" : 1508, "shard_02" : 1508, "shard_01" : 1508, "ns" : "data.sessions.20210607" } { "shard_04" : 1912, "shard_03" : 1911, "shard_02" : 1912, "shard_01" : 1911, "ns" : "data.sessions.20210608" } { "shard_03" : 2019, "shard_04" : 2019, "shard_01" : 2019, "shard_02" : 2018, "ns" : "data.sessions.20210609" } { "shard_01" : 1977, "shard_03" : 1977, "shard_04" : 1977, "shard_02" : 1977, "ns" : "data.sessions.20210610" } { "shard_03" : 1300, "shard_01" : 1300, "shard_04" : 1300, "shard_02" : 1299, "ns" : "data.sessions.20210611" } { "shard_02" : 1841, "shard_03" : 1840, "shard_04" : 1841, "shard_01" : 1841, "ns" : "data.sessions.20210612" } { "shard_04" : 2030, "shard_01" : 2029, "shard_03" : 2029, "shard_02" : 2030, "ns" : "data.sessions.20210613" } { "shard_02" : 1496, "shard_04" : 2273, "shard_01" : 2484, "shard_03" : 1708, "ns" : "data.sessions.20210615" } { "shard_03" : 2841, "shard_04" : 1179, "shard_01" : 2366, "shard_02" : 1333, "ns" : "data.sessions.20210616" } { "shard_01" : 8156, "ns" : "data.sessions.20210617" } { "shard_01" : 2967, "ns" : "data.sessions.20210618" } { "shard_01" : 10, "ns" : "data.sessions.20210619" } { "shard_01" : 10, "ns" : "data.sessions.20210620" } { "shard_01" : 10, "ns" : "data.sessions.20210621" } { "shard_01" : 224, "shard_04" : 199, "shard_02" : 1170, "shard_03" : 332, "ns" : "ignored.sessions.20210615" } { "shard_02" : 1148, "shard_04" : 315, "shard_01" : 218, "shard_03" : 237, "ns" : "ignored.sessions.20210616" } { "shard_02" : 1950, "ns" : "ignored.sessions.20210617" } { "shard_04" : 1, "shard_02" : 845, "ns" : "ignored.sessions.20210618" } { "shard_02" : 10, "ns" : "ignored.sessions.20210619" } { "shard_02" : 10, "ns" : "ignored.sessions.20210620" } { "shard_02" : 10, "ns" : "ignored.sessions.20210621" } { "shard_02" : 139, "shard_01" : 134, "shard_04" : 127, "shard_03" : 128, "ns" : "mip.statistics" }
Sharding status is like this. Apparently MongoDB hangs while balancing collection "mip.statistics"
sh.status()--- Sharding Status --- sharding version: { "_id" : 1, "minCompatibleVersion" : 5, "currentVersion" : 6, "clusterId" : ObjectId("608864f0e8dcb6218857ab2d") } shards: { "_id" : "shard_01", "host" : "shard_01/d-mipmdb-sh1-01.swi.srse.net:27018,d-mipmdb-sh2-01.swi.srse.net:27018", "state" : 1, "tags" : [ ] } { "_id" : "shard_02", "host" : "shard_02/d-mipmdb-sh1-02.swi.srse.net:27018,d-mipmdb-sh2-02.swi.srse.net:27018", "state" : 1, "tags" : [ ] } { "_id" : "shard_03", "host" : "shard_03/d-mipmdb-sh1-03.swi.srse.net:27018,d-mipmdb-sh2-03.swi.srse.net:27018", "state" : 1, "tags" : [ ] } { "_id" : "shard_04", "host" : "shard_04/d-mipmdb-sh1-04.swi.srse.net:27018,d-mipmdb-sh2-04.swi.srse.net:27018", "state" : 1, "tags" : [ ] } active mongoses: "4.4.3" : 16 "4.4.5" : 2 autosplit: Currently enabled: yes balancer: Currently enabled: yes Currently running: no Balancer active window is set between 02:10 and 01:50 server local time Collections with active migrations: mip.statistics started at Fri Jun 18 2021 11:13:38 GMT+0200 (W. Europe Daylight Time) Failed balancer rounds in last 5 attempts: 5 Last reported error: Could not find host matching read preference { mode: "primary" } for set shard_01 Time of Reported error: Fri Jun 18 2021 10:24:32 GMT+0200 (W. Europe Daylight Time) Migration Results for the last 24 hours: 3 : Success 1 : Failed with error 'aborted', from shard_02 to shard_04 databases: { "_id" : "mip", "primary" : "shard_01", "partitioned" : true, "version" : { "uuid" : UUID("4c4d4777-1a9e-4fd8-9b73-f579e1b5a83a"), "lastMod" : 1 } } mip.statistics shard key: { "ts" : "hashed" } unique: false balancing: true chunks: shard_01 134 shard_02 139 shard_03 128 shard_04 127 too many chunks to print, use verbose if you want to force print
As a quick solution I tried to drop the culprit collection but no success:
db.statistics.drop() Error: drop failed: { "ok" : 0, "errmsg" : "timed out waiting for mip.statistics", "code" : 46, "codeName" : "LockBusy", "operationTime" : Timestamp(1624017860, 1), "$clusterTime" : { "clusterTime" : Timestamp(1624017860, 1), "signature" : { "hash" : BinData(0,"lkY1zw1m1Zv/rqUaVsAWCkGrzjI="), "keyId" : NumberLong("6955920606428659733") } } } : _getErrorWithCode@src/mongo/shell/utils.js:25:13 DBCollection.prototype.drop@src/mongo/shell/collection.js:713:15 @(shell):1:1
I can insert or delete data from this collection, drop and create indexes but dropping it is not possible.
I also stopped/started the Balancer - no success
I even restarted the entire Sharded Cluster - no success either