Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-10653

unable to shard collection with collection x.y already sharded with 1 chunks error

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: 2.4.5
    • Component/s: Sharding
    • None
    • Environment:
      3 shards with replica set per shard, 3 config servers, 3 mongos
    • ALL

      Despite a collection not being sharded in coll.stats() and sh.status() reports, sharding the collection fails.

      1)

      db.ws9_User.stats()
      {
        "sharded": false,
        "primary": "rs_shard2"
       ...
      }
      

      2)

      sh.shardCollection('test3.ws9_User', { _id: 1})
      {
        "code": 13449,
        "ok": 0,
        "errmsg": "exception: collection test3.ws9_User already sharded with 1 chunks"
      }
      

      Dropping the collection did not help, subsequent sharding attempt still had the same problem.

      In contrast, sharding an already-sharded collection normally produces an "already sharded" message w/o "exception" in it:

      sh.shardCollection('test3.ws9_Account', { _id: 1})
      {
        "ok": 0,
        "errmsg": "already sharded"
      }
      

      Ended up working around the issue by doing "use config; db.chunks.remove(

      {"ns":"test3.ws9_User"}

      )" and restarting all mongos. However, I'm not sure if this is safe to do in a production dataset where we don't want to lose data (this is a test dataset that was OK to drop).

      This happened on one of several collections that were created in the same way.

      Possible trigger for this might be that we had several machines talking to several mongos servers, inserting data into all these collections. The code to set up sharding is done on first-access "collection not exist" basis:

      if (collection doesn't exist) { enableSharding(); }
      

      So there were insertions taking place while sharding was being enabled. It's also possible that two parallel sharding requests may have been taking place.

      I looked at https://github.com/mongodb/mongo/blob/v2.4/src/mongo/s/chunk.cpp#L1000 and there's a comment a bit above the place where the error 13449 is thrown:

              // TODO: Race condition if we shard the collection and insert data while we split across
              // the non-primary shard.
      

      Could this be a manifestation of this?

            Assignee:
            david.hows David Hows
            Reporter:
            oleg@evergage.com Oleg Rekutin
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: