-
Type: Bug
-
Resolution: Fixed
-
Priority: Critical - P2
-
Affects Version/s: 3.6, 3.6.1, 3.7, 3.7.1
-
Component/s: None
-
None
Testing replica set failover with retryWrites - when kill the primary the next request is not always working correctly.
MongoDB 3.6 - replica set with 3 nodes
Connection string options: replicaSet=repl_set_name&w=majority&journal=true&retryWrites=true
Reference code using update_one to show the failure attached with the command logs. This isn't specific to update, other commands using retryable writes have the same issue.
When kill the primary (killed by ctrl-c – so not a graceful step down), if the update is attempted immediately (within 3-5 seconds), the request will wait for the failover to complete as expected, however the update does not work correctly. The txnNumber in the request is incorrect (it is decremented and sends a previously used txnNumber). The server response indicates a successful upsert – however upsert = False in request and the collection is not modified.
This same test works correctly if the primary is stepped down or if you wait 3-5 seconds after killing the primary before sending the update command (still before replica set election takes place). The correctly incremented txNumber is sent and the document is modifed.
Originally using pymongo 3.6. Switched to a new virtual environment with the latest pymongo from pip and confirmed works the same. Attached logs are with 3.7.1.
python -c "import sys; print(sys.version)"
3.4.1 (default, Mar 29 2016, 09:28:33)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)]
python -c "import pymongo; print(pymongo.version); print(pymongo.has_c())"
3.7.1
True
- is related to
-
PYTHON-1657 Replica set failover test for retryable writes
- Closed
-
DRIVERS-2139 Test retryable writes against real shutdown scenarios
- Backlog