-
Type: Bug
-
Resolution: Fixed
-
Priority: Critical - P2
-
Affects Version/s: 4.4.0, 4.4.1, 4.4.2, 4.4.3
-
Component/s: None
-
Fully Compatible
-
ALL
-
v5.0
-
Sharding 2021-05-17, Sharding 2021-05-31
-
(copied to CRM)
-
49
4.4 mongos does not attach RetryableWriteError label to shutdown errors. This a bug because it means that drivers will not retry writes when mongos shuts down.
To reproduce:
- start a sharded cluster with two mongoses on ports 27017 and 27018.
- run reproMongosShutdownNoErrorLabelBug.py
The repro script runs 4 threads which each run a retryable findAndModify which takes 5 seconds to complete (artificially delayed with $where). While the 4 retryable writes are ongoing, another thread causes one of the mongoses to shutdown via shutdown: 1, force: true. The expected behavior should be that mongos returns a Shutdown error with the RetryableWriteError error label which indicates that the client can retry the operation. Instead the mongos returns a Shutdown error without any labels so the driver does not retry:
2021-01-06 13:43:31,536 [INFO] Thread-5:37 - Command findAndModify with request id 114807987 on server ('localhost', 27017) failed in 2000641 microseconds. Failure: {"ok": {"$numberDouble": "0.0"}, "errmsg": "interrupted at shutdown", "code": {"$numberInt": "11600"}, "codeName": "InterruptedAtShutdown", "operationTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}}
The full script output is shown below:
$ python3.9 reproMongosShutdownNoErrorLabelBug.py 2021-01-06 13:43:29,215 [INFO] MainThread:68 - Pymongo version: 4.0.dev0 2021-01-06 13:43:29,218 [INFO] MainThread:23 - Command buildinfo with request id 1457850878 started on server ('localhost', 27017). Full command: {"buildinfo": {"$numberInt": "1"}, "lsid": {"id": {"$binary": {"base64": "6ITD9ukmTEKPXHDb4SUuuw==", "subType": "04"}}}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969408, "i": 1}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}, "$db": "admin", "$readPreference": {"mode": "primary"}} 2021-01-06 13:43:29,219 [INFO] MainThread:29 - Command buildinfo with request id 1457850878 on server ('localhost', 27017) succeeded in 229 microseconds. Reply: {"version": "4.4.3", "gitVersion": "913d6b62acfbb344dde1b116f4161360acd8fd13", "modules": ["enterprise"], "allocator": "system", "javascriptEngine": "mozjs", "sysInfo": "deprecated", "versionArray": [{"$numberInt": "4"}, {"$numberInt": "4"}, {"$numberInt": "3"}, {"$numberInt": "0"}], "openssl": {"running": "Apple Secure Transport"}, "buildEnvironment": {"distmod": "", "distarch": "x86_64", "cc": "/Applications/Xcode10.2.0.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang: Apple LLVM version 10.0.1 (clang-1001.0.46.3)", "ccflags": "-isysroot /Applications/Xcode10.2.0.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk -mmacosx-version-min=10.13 -target darwin17.0.0 -arch x86_64 -fno-omit-frame-pointer -fno-strict-aliasing -fasynchronous-unwind-tables -ggdb -pthread -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -Werror -O2 -Wno-unused-local-typedefs -Wno-unused-function -Wno-unused-private-field -Wno-deprecated-declarations -Wno-tautological-constant-out-of-range-compare -Wno-tautological-constant-compare -Wno-tautological-unsigned-zero-compare -Wno-tautological-unsigned-enum-zero-compare -Wno-unused-const-variable -Wno-missing-braces -Wno-inconsistent-missing-override -Wno-potentially-evaluated-expression -Wno-unused-lambda-capture -Wno-exceptions -Wunguarded-availability -fstack-protector-strong -fno-builtin-memcmp", "cxx": "/Applications/Xcode10.2.0.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang++: Apple LLVM version 10.0.1 (clang-1001.0.46.3)", "cxxflags": "-Woverloaded-virtual -Werror=unused-result -Wpessimizing-move -Wredundant-move -Wno-undefined-var-template -Wno-instantiation-after-specialization -fsized-deallocation -Wunused-exception-parameter -stdlib=libc++ -std=c++17", "linkflags": "-Wl,-syslibroot,/Applications/Xcode10.2.0.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk -mmacosx-version-min=10.13 -target darwin17.0.0 -arch x86_64 -Wl,-bind_at_load -Wl,-fatal_warnings -fstack-protector-strong -stdlib=libc++ -Wl,-rpath,@loader_path/../lib -Wl,-rpath,@loader_path/../Frameworks", "target_arch": "x86_64", "target_os": "macOS", "cppdefines": "SAFEINT_USE_INTRINSICS 0 PCRE_STATIC NDEBUG BOOST_THREAD_VERSION 5 BOOST_THREAD_USES_DATETIME BOOST_SYSTEM_NO_DEPRECATED BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS BOOST_ENABLE_ASSERT_DEBUG_HANDLER BOOST_LOG_NO_SHORTHAND_NAMES BOOST_LOG_USE_NATIVE_SYSLOG BOOST_LOG_WITHOUT_THREAD_ATTR ABSL_FORCE_ALIGNED_ACCESS"}, "bits": {"$numberInt": "64"}, "debug": false, "maxBsonObjectSize": {"$numberInt": "16777216"}, "ok": {"$numberDouble": "1.0"}, "operationTime": {"$timestamp": {"t": 1609969408, "i": 1}}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969408, "i": 1}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}} 2021-01-06 13:43:29,219 [INFO] MainThread:69 - Mongos version: 4.4.3 2021-01-06 13:43:29,220 [INFO] MainThread:23 - Command insert with request id 1458777923 started on server ('localhost', 27017). Full command: {"insert": "test", "ordered": true, "lsid": {"id": {"$binary": {"base64": "6ITD9ukmTEKPXHDb4SUuuw==", "subType": "04"}}}, "txnNumber": {"$numberLong": "1"}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969408, "i": 1}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}, "$db": "test", "documents": [{"_id": {"$oid": "5ff62f01c381ca475a8f06eb"}}, {"_id": {"$oid": "5ff62f01c381ca475a8f06ec"}}]} 2021-01-06 13:43:29,530 [INFO] MainThread:29 - Command insert with request id 1458777923 on server ('localhost', 27017) succeeded in 310134 microseconds. Reply: {"n": {"$numberInt": "2"}, "ok": {"$numberDouble": "1.0"}, "operationTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}} 2021-01-06 13:43:29,530 [INFO] MainThread:71 - Topology: <TopologyDescription id: 5ff62f01c381ca475a8f06ea, topology_type: Sharded, servers: [<ServerDescription ('localhost', 27017) server_type: Mongos, rtt: 0.000423346000000005>, <ServerDescription ('localhost', 27018) server_type: Mongos, rtt: 0.0007042669999999945>]> 2021-01-06 13:43:29,532 [INFO] Thread-3:23 - Command findAndModify with request id 2007237709 started on server ('localhost', 27017). Full command: {"findAndModify": "test", "query": {"$where": "function() { sleep(5.000000 * 1000); return true; }"}, "new": false, "update": {"a": {"$numberDouble": "0.3098804251543492"}}, "upsert": false, "lsid": {"id": {"$binary": {"base64": "czsq48x7SGmRM5JCcCUXgA==", "subType": "04"}}}, "txnNumber": {"$numberLong": "1"}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}, "$db": "test", "$readPreference": {"mode": "primary"}} 2021-01-06 13:43:29,534 [INFO] Thread-2:23 - Command findAndModify with request id 74243042 started on server ('localhost', 27018). Full command: {"findAndModify": "test", "query": {"$where": "function() { sleep(5.000000 * 1000); return true; }"}, "new": false, "update": {"a": {"$numberDouble": "0.078846960311998"}}, "upsert": false, "lsid": {"id": {"$binary": {"base64": "6ITD9ukmTEKPXHDb4SUuuw==", "subType": "04"}}}, "txnNumber": {"$numberLong": "2"}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}, "$db": "test", "$readPreference": {"mode": "primary"}} 2021-01-06 13:43:29,535 [INFO] Thread-5:23 - Command findAndModify with request id 114807987 started on server ('localhost', 27017). Full command: {"findAndModify": "test", "query": {"$where": "function() { sleep(5.000000 * 1000); return true; }"}, "new": false, "update": {"a": {"$numberDouble": "0.32786975581722355"}}, "upsert": false, "lsid": {"id": {"$binary": {"base64": "AhqvyEnDS8qXeRpUm69JaQ==", "subType": "04"}}}, "txnNumber": {"$numberLong": "1"}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}, "$db": "test", "$readPreference": {"mode": "primary"}} 2021-01-06 13:43:29,535 [INFO] Thread-4:23 - Command findAndModify with request id 1137522503 started on server ('localhost', 27018). Full command: {"findAndModify": "test", "query": {"$where": "function() { sleep(5.000000 * 1000); return true; }"}, "new": false, "update": {"a": {"$numberDouble": "0.5628606621243788"}}, "upsert": false, "lsid": {"id": {"$binary": {"base64": "axIME8ZQQj+WtUrWovKREQ==", "subType": "04"}}}, "txnNumber": {"$numberLong": "1"}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}, "$db": "test", "$readPreference": {"mode": "primary"}} 2021-01-06 13:43:31,531 [INFO] Thread-1:59 - Causing mongos shutdown via {shutdown:1, force:1} 2021-01-06 13:43:31,534 [INFO] Thread-1:23 - Command shutdown with request id 16531729 started on server ('localhost', 27017). Full command: {"shutdown": {"$numberInt": "1"}, "force": true, "lsid": {"id": {"$binary": {"base64": "UzOkg3E+RjGR8Y0omOjFZw==", "subType": "04"}}}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}, "$db": "admin", "$readPreference": {"mode": "primary"}} 2021-01-06 13:43:31,536 [INFO] Thread-3:37 - Command findAndModify with request id 2007237709 on server ('localhost', 27017) failed in 2003442 microseconds. Failure: {"ok": {"$numberDouble": "0.0"}, "errmsg": "interrupted at shutdown", "code": {"$numberInt": "11600"}, "codeName": "InterruptedAtShutdown", "operationTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}} 2021-01-06 13:43:31,536 [INFO] Thread-5:37 - Command findAndModify with request id 114807987 on server ('localhost', 27017) failed in 2000641 microseconds. Failure: {"ok": {"$numberDouble": "0.0"}, "errmsg": "interrupted at shutdown", "code": {"$numberInt": "11600"}, "codeName": "InterruptedAtShutdown", "operationTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969409, "i": 8}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}} 2021-01-06 13:43:31,536 [ERROR] Thread-5:54 - Thread failed Traceback (most recent call last): File "/Users/shane/git/mongo-python-driver/reproMongosShutdownNoErrorLabelBug.py", line 50, in find_and_modify client.test.test.find_one_and_replace({'$where': delay(5)}, {'a': random.random()}) File "/Users/shane/git/mongo-python-driver/pymongo/collection.py", line 3124, in find_one_and_replace return self.__find_and_modify(filter, projection, File "/Users/shane/git/mongo-python-driver/pymongo/collection.py", line 2974, in __find_and_modify return self.__database.client._retryable_write( File "/Users/shane/git/mongo-python-driver/pymongo/mongo_client.py", line 1498, in _retryable_write return self._retry_with_session(retryable, func, s, None) File "/Users/shane/git/mongo-python-driver/pymongo/mongo_client.py", line 1384, in _retry_with_session return self._retry_internal(retryable, func, session, bulk) File "/Users/shane/git/mongo-python-driver/pymongo/mongo_client.py", line 1416, in _retry_internal return func(session, sock_info, retryable) File "/Users/shane/git/mongo-python-driver/pymongo/collection.py", line 2964, in _find_and_modify out = self._command(sock_info, cmd, File "/Users/shane/git/mongo-python-driver/pymongo/collection.py", line 238, in _command return sock_info.command( File "/Users/shane/git/mongo-python-driver/pymongo/pool.py", line 724, in command self._raise_connection_failure(error) File "/Users/shane/git/mongo-python-driver/pymongo/pool.py", line 708, in command return command(self, dbname, spec, slave_ok, File "/Users/shane/git/mongo-python-driver/pymongo/network.py", line 159, in command helpers._check_command_response( File "/Users/shane/git/mongo-python-driver/pymongo/helpers.py", line 151, in _check_command_response raise NotMasterError(errmsg, response) pymongo.errors.NotMasterError: interrupted at shutdown, full error: {'ok': 0.0, 'errmsg': 'interrupted at shutdown', 'code': 11600, 'codeName': 'InterruptedAtShutdown', 'operationTime': Timestamp(1609969409, 8), '$clusterTime': {'clusterTime': Timestamp(1609969409, 8), 'signature': {'hash': b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'keyId': 0}}} 2021-01-06 13:43:31,537 [ERROR] Thread-3:54 - Thread failed Traceback (most recent call last): File "/Users/shane/git/mongo-python-driver/reproMongosShutdownNoErrorLabelBug.py", line 50, in find_and_modify client.test.test.find_one_and_replace({'$where': delay(5)}, {'a': random.random()}) File "/Users/shane/git/mongo-python-driver/pymongo/collection.py", line 3124, in find_one_and_replace return self.__find_and_modify(filter, projection, File "/Users/shane/git/mongo-python-driver/pymongo/collection.py", line 2974, in __find_and_modify return self.__database.client._retryable_write( File "/Users/shane/git/mongo-python-driver/pymongo/mongo_client.py", line 1498, in _retryable_write return self._retry_with_session(retryable, func, s, None) File "/Users/shane/git/mongo-python-driver/pymongo/mongo_client.py", line 1384, in _retry_with_session return self._retry_internal(retryable, func, session, bulk) File "/Users/shane/git/mongo-python-driver/pymongo/mongo_client.py", line 1416, in _retry_internal return func(session, sock_info, retryable) File "/Users/shane/git/mongo-python-driver/pymongo/collection.py", line 2964, in _find_and_modify out = self._command(sock_info, cmd, File "/Users/shane/git/mongo-python-driver/pymongo/collection.py", line 238, in _command return sock_info.command( File "/Users/shane/git/mongo-python-driver/pymongo/pool.py", line 724, in command self._raise_connection_failure(error) File "/Users/shane/git/mongo-python-driver/pymongo/pool.py", line 708, in command return command(self, dbname, spec, slave_ok, File "/Users/shane/git/mongo-python-driver/pymongo/network.py", line 159, in command helpers._check_command_response( File "/Users/shane/git/mongo-python-driver/pymongo/helpers.py", line 151, in _check_command_response raise NotMasterError(errmsg, response) pymongo.errors.NotMasterError: interrupted at shutdown, full error: {'ok': 0.0, 'errmsg': 'interrupted at shutdown', 'code': 11600, 'codeName': 'InterruptedAtShutdown', 'operationTime': Timestamp(1609969409, 8), '$clusterTime': {'clusterTime': Timestamp(1609969409, 8), 'signature': {'hash': b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'keyId': 0}}} 2021-01-06 13:43:31,539 [INFO] Thread-1:37 - Command shutdown with request id 16531729 on server ('localhost', 27017) failed in 5153 microseconds. Failure: {"errmsg": "connection closed", "errtype": "AutoReconnect"} Exception in thread Thread-1: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 950, in _bootstrap_inner self.run() File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 888, in run self._target(*self._args, **self._kwargs) File "/Users/shane/git/mongo-python-driver/reproMongosShutdownNoErrorLabelBug.py", line 60, in shutdown_mongos_after client.admin.command('shutdown', 1, force=True) File "/Users/shane/git/mongo-python-driver/pymongo/database.py", line 741, in command return self._command(sock_info, command, slave_ok, value, File "/Users/shane/git/mongo-python-driver/pymongo/database.py", line 629, in _command return sock_info.command( File "/Users/shane/git/mongo-python-driver/pymongo/pool.py", line 724, in command self._raise_connection_failure(error) File "/Users/shane/git/mongo-python-driver/pymongo/pool.py", line 708, in command return command(self, dbname, spec, slave_ok, File "/Users/shane/git/mongo-python-driver/pymongo/network.py", line 150, in command reply = receive_message(sock_info, request_id) File "/Users/shane/git/mongo-python-driver/pymongo/network.py", line 195, in receive_message _receive_data_on_socket(sock_info, 16, deadline)) File "/Users/shane/git/mongo-python-driver/pymongo/network.py", line 292, in _receive_data_on_socket raise AutoReconnect("connection closed") pymongo.errors.AutoReconnect: connection closed 2021-01-06 13:43:44,609 [INFO] Thread-2:29 - Command findAndModify with request id 74243042 on server ('localhost', 27018) succeeded in 15074637 microseconds. Reply: {"lastErrorObject": {"n": {"$numberInt": "1"}, "updatedExisting": true}, "value": {"_id": {"$oid": "5ff62f01c381ca475a8f06eb"}, "a": {"$numberDouble": "0.32786975581722355"}}, "ok": {"$numberDouble": "1.0"}, "operationTime": {"$timestamp": {"t": 1609969424, "i": 2}}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969424, "i": 2}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}} 2021-01-06 13:43:44,609 [INFO] Thread-2:52 - FIND_AND_MODIFY took: 15.078490972518921 2021-01-06 13:43:54,616 [INFO] Thread-4:29 - Command findAndModify with request id 1137522503 on server ('localhost', 27018) succeeded in 25079911 microseconds. Reply: {"lastErrorObject": {"n": {"$numberInt": "1"}, "updatedExisting": true}, "value": {"_id": {"$oid": "5ff62f01c381ca475a8f06eb"}, "a": {"$numberDouble": "0.078846960311998"}}, "ok": {"$numberDouble": "1.0"}, "operationTime": {"$timestamp": {"t": 1609969434, "i": 2}}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1609969434, "i": 2}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": {"$numberLong": "0"}}}} 2021-01-06 13:43:54,616 [INFO] Thread-4:52 - FIND_AND_MODIFY took: 25.083528995513916
I can reproduce this bug on all 4.4 versions and 4.9.
- is related to
-
SERVER-64046 Mongos does not return errorlabels if it is interrupted while parsing a request
- Closed
-
PYTHON-2462 Implement and test connection pool paused state
- Closed
-
SERVER-41245 Add RetryableWriteError Error Label
- Closed
- related to
-
SERVER-55648 Mongos doesn't return top-level batch-write error in case of shutdown
- Closed
-
SERVER-59474 Return a shutdown error as top-level error for batch write command that fails with CallbackCanceled when mongos shuts down
- Closed
-
SERVER-62175 Mongos fails to attach RetryableWrite Error Label For Command Interrupted In _parseCommand
- Closed
-
SERVER-64642 Fix error where mongos returns CallbackCanceled and not InterruptedAtShutdown on shutdown
- Closed