Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-46679

Transaction test fails with Stale Shard Version error on 4.5

    • Fully Compatible
    • ALL
    • v4.4
    • Sharding 2020-04-06
    • 15

      A change in SERVER-45779 (server version 4.5.0-34-g45341f9) is causing one of Python's transactions documentation test to fail on sharded clusters with the following error (PYTHON-2154):
      "Transaction 58ec29bc-f257-43ce-81ed-2db10a7da770:3 was aborted on statement 0 due to: an error from cluster data placement change :: caused by :: Encountered error from localhost:27217 during a transaction :: caused by :: sharding status of collection hr.employees is not currently available for filtering and needs to be recovered from the config server"

      Here's a link to source for the failing "update_employee_info" transaction test: https://github.com/mongodb/mongo-python-driver/blob/1fdfb9864b3c0614f0f46add0bb7a25ebb8d2e77/test/test_examples.py#L902

      The failure happens on the first command in the transaction on line 875.

      The test does not shard the collection or enable sharding on the database so hr.employees is an unsharded collection.

      Here are the mongos logs for the failing test:

      2020-03-05T21:58:17.855+0000 I  NETWORK  [listener] connection accepted from 127.0.0.1:42732 #685 (19 connections now open)
      2020-03-05T21:58:17.857+0000 I  NETWORK  [conn685] received client metadata from 127.0.0.1:42732 conn685: {"driver":{"name":"PyMongo","version":"3.11.0.dev1"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux Server 6.2 Santiago","architecture":"x86_64","version":"2.6.32-220.el6.x86_64"},"platform":"CPython 2.7.17.final.0"}
      2020-03-05T21:58:17.860+0000 I  NETWORK  [listener] connection accepted from 127.0.0.1:42733 #686 (20 connections now open)
      2020-03-05T21:58:17.861+0000 I  NETWORK  [conn686] received client metadata from 127.0.0.1:42733 conn686: {"driver":{"name":"PyMongo","version":"3.11.0.dev1"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux Server 6.2 Santiago","architecture":"x86_64","version":"2.6.32-220.el6.x86_64"},"platform":"CPython 2.7.17.final.0"}
      2020-03-05T21:58:17.862+0000 I  NETWORK  [ConfigServerCatalogCacheLoader-0] ignoring minOpTime for { mode: "nearest" }
      2020-03-05T21:58:17.863+0000 I  NETWORK  [ConfigServerCatalogCacheLoader-0] ignoring minOpTime for { mode: "primary" }
      2020-03-05T21:58:17.864+0000 I  SH_REFR  [ConfigServerCatalogCacheLoader-0] Refresh for database hr took 1 ms and failed :: caused by :: NamespaceNotFound: database hr not found
      2020-03-05T21:58:17.864+0000 I  NETWORK  [conn686] ignoring minOpTime for { mode: "primary" }
      2020-03-05T21:58:17.875+0000 I  NETWORK  [ConfigServerCatalogCacheLoader-0] ignoring minOpTime for { mode: "nearest" }
      2020-03-05T21:58:17.876+0000 I  SH_REFR  [ConfigServerCatalogCacheLoader-0] Refresh for database hr from version {} to version {"uuid":{"$uuid":"4f3f02c9-58eb-41fa-978c-8c8f78cb99c8"},"lastMod":1} took 0 ms
      2020-03-05T21:58:17.876+0000 I  NETWORK  [conn686] ignoring minOpTime for { mode: "nearest" }
      2020-03-05T21:58:17.890+0000 I  NETWORK  [ConfigServerCatalogCacheLoader-0] ignoring minOpTime for { mode: "nearest" }
      2020-03-05T21:58:17.891+0000 I  NETWORK  [ConfigServerCatalogCacheLoader-0] ignoring minOpTime for { mode: "primary" }
      2020-03-05T21:58:17.892+0000 I  SH_REFR  [ConfigServerCatalogCacheLoader-0] Refresh for database reporting took 1 ms and failed :: caused by :: NamespaceNotFound: database reporting not found
      2020-03-05T21:58:17.892+0000 I  NETWORK  [conn686] ignoring minOpTime for { mode: "primary" }
      2020-03-05T21:58:17.904+0000 I  NETWORK  [ConfigServerCatalogCacheLoader-0] ignoring minOpTime for { mode: "nearest" }
      2020-03-05T21:58:17.905+0000 I  SH_REFR  [ConfigServerCatalogCacheLoader-0] Refresh for database reporting from version {} to version {"uuid":{"$uuid":"89d001c7-f571-47cf-885e-5e7e4d682360"},"lastMod":1} took 0 ms
      2020-03-05T21:58:17.905+0000 I  NETWORK  [conn686] ignoring minOpTime for { mode: "nearest" }
      2020-03-05T21:58:17.933+0000 I  NETWORK  [conn686] ignoring minOpTime for { mode: "primary" }
      2020-03-05T21:58:18.039+0000 I  -        [conn686] slow query{"type":"command","ns":"reporting.$cmd","command":{"dropDatabase":1,"writeConcern":{"w":"majority"},"lsid":{"id":{"$uuid":"05f6af3a-f73b-472e-8924-1517ee7094f4"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1583445497,"i":167}},"signature":{"keyId":0,"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}}}},"$db":"reporting","$readPreference":{"mode":"primary"}},"numYields":0,"reslen":186,"protocol":"op_msg","durationMillis":106}
      2020-03-05T21:58:18.041+0000 I  NETWORK  [conn686] ignoring minOpTime for { mode: "primary" }
      2020-03-05T21:58:18.142+0000 I  -        [conn686] slow query{"type":"command","ns":"hr.$cmd","command":{"dropDatabase":1,"writeConcern":{"w":"majority"},"lsid":{"id":{"$uuid":"05f6af3a-f73b-472e-8924-1517ee7094f4"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1583445498,"i":5}},"signature":{"keyId":0,"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}}}},"$db":"hr","$readPreference":{"mode":"primary"}},"numYields":0,"reslen":179,"protocol":"op_msg","durationMillis":100}
      

      The mongod logs for this time are large so I've attached them: db.zip. I do see this error:

      2020-03-05T21:58:17.919+0000 D1 -        [conn44] User Assertion: StaleConfig{ ns: "hr.employees", vReceived: Timestamp(0, 0), vReceivedEpoch: ObjectId('000000000000000000000000'), shardId: "sh01" }: sharding status of collection hr.employees is not currently available for filtering and needs to be recovered from the config server src/mongo/db/s/collection_sharding_runtime.cpp 140
      

            Assignee:
            marcos.grillo@mongodb.com Marcos José Grillo Ramirez
            Reporter:
            shane.harvey@mongodb.com Shane Harvey
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: