Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-45844

UUID shard key values cause failed chunk migrations

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Sharding
    • None
    • ALL
    • Show
      sh-status
    • Sharding 2020-05-04

      UUID shard key values result in chunk _ids (in the chunks collection) that aren't correctly inferred when moving chunks. That is, the chunk _id represents data as BinData. The result is that the chunk cannot be found.

      original description

      Hi we are using mongo sharded cluster running with 4.0.5.( this cluster was upgraded from 3.4.9->3.6.9>4.0.5 couple of months back)

      Architecture:
            3 mongos
             config server running as replica set ( 1 primary + 2 secondaries)
            1 shard with 3 nodes running as replica set ( 1 primary + 2 secondaries)

      Since shard1 is running out of disk space, we added shard2.After adding shard2, we see Balancer is not moving chunks and it's throwing following message

      2020-01-29T17:35:21.256+0000 I SHARDING [Balancer] Balancer move keychain.keyring: [\{ keyringId: MinKey }, \{ keyringId: UUID("00000000-2da7-4f75-826f-fb8939b25f3f") }), from prod-mongodb-digitalplatform-02-shard1, to prod-mongodb-digitalplatform-02-shard2 failed :: caused by :: IncompatibleShardingMetadata: Chunk move was not successful :: caused by :: Tried to find the chunk for 'keychain.keyring-keyringId_UUID("06597831-055b-425f-8f83-63d6935bc55b"), but found no chunks
      2020-01-29T17:35:21.256+0000 I SHARDING [Balancer] about to log metadata event into actionlog: \{ _id: "ip-10-120-122-158-2020-01-29T17:35:21.256+0000-5e31c259018fac481bd1ee62", server: "ip-10-120-122-158", clientAddr: "", time: new Date(1580319321256), what: "balancer.round", ns: "", details: { executionTimeMillis: 559, errorOccured: false, candidateChunks: 1, chunksMoved: 0 } }
      

      we tried even moving some of chunks manually and they also failed with same reason.

      sh.status() output is attached

      We issued the following command to include chunk info from above sh.status() output to move one chunk

      command:

      db.adminCommand( { moveChunk : "keychain.keyring" ,
                       bounds : [{ "keyringId" : UUID("fff68145-c9f1-4915-a2da-5f66d02820ad") }, { "keyringId" : UUID("fffb99c8-3726-47cb-94b5-6637a36788c0") }] ,
                       to : "prod-mongodb-digitalplatform-02-shard2"
                        } )
      

      Output:

      mongos> db.adminCommand( { moveChunk : "keychain.keyring" ,
      ...                  bounds : [{ "keyringId" : UUID("fff68145-c9f1-4915-a2da-5f66d02820ad") }, { "keyringId" : UUID("fffb99c8-3726-47cb-94b5-6637a36788c0") }] ,
      ...                  to : "prod-mongodb-digitalplatform-02-shard2"
      ...                   } )
      {
      	"ok" : 0,
      	"errmsg" : "Chunk move was not successful :: caused by :: Tried to find the chunk for 'keychain.keyring-keyringId_UUID(\"fff68145-c9f1-4915-a2da-5f66d02820ad\"), but found no chunks",
      	"code" : 105,
      	"codeName" : "IncompatibleShardingMetadata",
      	"operationTime" : Timestamp(1580320429, 23),
      	"$clusterTime" : {
      		"clusterTime" : Timestamp(1580320429, 39),
      		"signature" : {
      			"hash" : BinData(0,"3e0AlrK695L04KiXhy+axkBMaNk="),
      			"keyId" : NumberLong("6762645768143634453")
                  }
                  }
                  }
      

      Apart from this , we also issued flushRouterConfig multiple times and we restarted all mongos and even replaced all config servers with new servers. But still same issue exists.

      Note: featureCompatibilityVersion is set to 4.0 on all shards and config server.

      Please let me know if there is any known bug around this or any configuration that we need to tweak on our side.

            Assignee:
            cheahuychou.mao@mongodb.com Cheahuychou Mao
            Reporter:
            haidilip83@gmail.com Dilip Kolasani
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: