-
Type: Bug
-
Resolution: Done
-
Priority: Critical - P2
-
Affects Version/s: 3.0.2, 3.0.3, 3.1.3
-
Component/s: WiredTiger
-
Fully Compatible
-
Linux
-
Quint Iteration 4
ISSUE SUMMARY
MongoDB with WiredTiger may produce WriteConflict exceptions under the following conditions:
- when doing bulk or multi updates, and
- these updates have upsert=true set, and
- no documents match the update condition, and
- these operations use a unique index on the updated field
USER IMPACT
These exceptions cause the insert part of the write operation to fail and an error is returned to the application.
WORKAROUNDS
Retry failed write operations as inserts at the application level.
AFFECTED VERSIONS
MongoDB 3.0.0 to 3.0.3 with WiredTiger.
FIX VERSION
The fix is included in the 3.0.4 production release.
Original description
Intro
We have a pretty complex python application running over Apache Storm 0.9.2, MongoEngine 0.8.7, PyMongo 2.8, and MongoDB 3.0.2. Our Mongo setup is a sharding cluster (3 mongos, 3 mongo-config and 5 ReplicaSets) and all ReplicaSets are running with WiredTiger.
With a very high load of concurrent updates, I get a lot of exceptions at application level:
OperationError: Update failed (exception: WriteConflict)
And looking into the mongod logs:
2015-04-22T16:24:02.100+0200 I WRITE [conn81979] Had WriteConflict during multi update, aborting 2015-04-22T16:24:04.907+0200 I WRITE [conn75303] Had WriteConflict during multi update, aborting 2015-04-22T16:24:18.538+0200 I WRITE [conn81979] Had WriteConflict during multi update, aborting 2015-04-22T16:24:27.539+0200 I WRITE [conn75526] Had WriteConflict during multi update, aborting 2015-04-22T16:24:34.413+0200 I WRITE [conn81979] Had WriteConflict during multi update, aborting 2015-04-22T16:24:34.455+0200 I WRITE [conn77036] Had WriteConflict during multi update, aborting 2015-04-22T16:24:46.693+0200 I WRITE [conn81979] Had WriteConflict during multi update, aborting
I've mitigated the issue with a try/except during the update operation and re-doing it in the except. Seems to work fine almost all the time, but sometimes it fails twice in a row.
I've asked this same issue in StackOverflow (http://stackoverflow.com/questions/29800823/mongodb-with-wiredtiger-lots-of-writeconflict) and in the mongodb-user mailing list. There, Asya (thanks!) asked me to open this ticket.
The update operation that fails with WriteConflict
With MongoEngine we hace created a model with a custom QuerySet that simply does a bunch of:
self.update(inc__valueX=CONSTANT * ammount, upsert=True)
Where ammount is 99% of the time 1.0.
What the mongo server really does (what this traduces to) is:
db.collection.update({ s: ObjectId('53cd0cdac2f4ed5f76527f9a'), u: ObjectId('553df07f996d7110b3a2843b') }, { $inc: { value: VALUE } }, multi: true, upsert: true })
Where VALUE is the result of (CONSTANT * ammount).
The
db.collection.stats()
are attached.
Current setup
Currently we have 3 mongos, 3 mongo-config (running on the same servers as the mongos) and 5 replicaSet, each one with 3 members. Mongo-config are running with MMAPv1 and replicaSets with WiredTiger. All of them are running this MongoDB version:
2015-04-10T10:34:06.213+0200 I CONTROL [initandlisten] MongoDB starting : pid=21243 port=27017 dbpath=/opt/mongodb 64-bit host=bb07 2015-04-10T10:34:06.213+0200 I CONTROL [initandlisten] db version v3.0.2 2015-04-10T10:34:06.213+0200 I CONTROL [initandlisten] git version: 6201872043ecbbc0a4cc169b5482dcf385fc464f 2015-04-10T10:34:06.213+0200 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014 2015-04-10T10:34:06.213+0200 I CONTROL [initandlisten] build info: Linux ip-10-229-1-2 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 x86_64 BOOST_LIB_ VERSION=1_49 2015-04-10T10:34:06.213+0200 I CONTROL [initandlisten] allocator: tcmalloc 2015-04-10T10:34:06.213+0200 I CONTROL [initandlisten] options: { config: "/etc/mongod.conf", net: { http: { enabled: false } }, replication: { replSet: "rs01" }, storage: { dbPath: "/opt/mongodb", engine: "wiredTiger" }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongod.log" } }
> mongos --version MongoS version 3.0.2 starting: pid=19084 port=27017 64-bit host=bb01 (--help for usage) git version: 6201872043ecbbc0a4cc169b5482dcf385fc464f OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014 build sys info: Linux ip-10-229-1-2 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
The Apache Storm machines (the ones performing the update() operations, run on different servers and same OS version. The whole setup is running on top of VMware 5.5.
Logs
After running a
db.adminCommand({setParameter:1, logLevel:5})
in one of the primarys and waiting for the error to reproduce, this is the log output:
2015-04-27T10:17:03.359+0200 D COMMAND [conn63749] run command database.$cmd { update: "collection", upda tes: [ { q: { s: ObjectId('53cd0cdac2f4ed5f76527f9a'), u: ObjectId('553df07f996d7110b3a2843b') } , u: { $inc: { value: 0.0328 } }, multi: true, upsert: true } ], ordered: true, metadata: { shardName: "b brs03", shardVersion: [ Timestamp 856000|4, ObjectId('552f8fbfeaa225ff8d70a7dd') ], session: 0 } } 2015-04-27T10:17:03.359+0200 D QUERY [conn63749] [QLOG] Tagging the match expression according to cach e data: Filter: $and s == ObjectId('53cd0cdac2f4ed5f76527f9a') u == ObjectId('553df07f996d7110b3a2843b') Cache data: (index-tagged expression tree: tree=Node ---Leaf { u: 1, s: 1 }, pos: 1 ---Leaf { u: 1, s: 1 }, pos: 0 ) 2015-04-27T10:17:03.359+0200 D QUERY [conn63749] [QLOG] Index 0: { _id: 1 } 2015-04-27T10:17:03.359+0200 D QUERY [conn63749] [QLOG] Index 1: { u: 1, s: 1 } 2015-04-27T10:17:03.359+0200 D QUERY [conn63749] [QLOG] Index 2: { u: 1, valueX: -1 } 2015-04-27T10:17:03.359+0200 D QUERY [conn63749] [QLOG] Index 3: { u: 1, valueY: -1 } 2015-04-27T10:17:03.359+0200 D QUERY [conn63749] [QLOG] Index 4: { u: 1, value: -1 } 2015-04-27T10:17:03.359+0200 D QUERY [conn63749] [QLOG] Index 5: { u: 1, valueZ: -1 } 2015-04-27T10:17:03.359+0200 D QUERY [conn63749] [QLOG] Tagged tree: $and u == ObjectId('553df07f996d7110b3a2843b') || Selected Index #1 pos 0 s == ObjectId('53cd0cdac2f4ed5f76527f9a') || Selected Index #1 pos 1 2015-04-27T10:17:03.359+0200 D QUERY [conn63749] [QLOG] Planner: solution constructed from the cache: FETCH ---fetched = 1 ---sortedByDiskLoc = 1 ---getSort = [{}, { s: 1 }, { u: 1 }, { u: 1, s: 1 }, ] ---Child: ------IXSCAN ---------keyPattern = { u: 1, s: 1 } ---------direction = 1 ---------bounds = field #0['u']: [ObjectId('553df07f996d7110b3a2843b'), ObjectId('553df07f996d7110b3a2843b')], field #1['s']: [ObjectId('53cd0cdac2f4ed5f76527f9a'), ObjectId('53cd0cdac2f4ed5f76527f9a')] ---------fetched = 0 ---------sortedByDiskLoc = 1 ---------getSort = [{}, { s: 1 }, { u: 1 }, { u: 1, s: 1 }, ] 2015-04-27T10:17:03.359+0200 D STORAGE [conn63749] WT begin_transaction 2015-04-27T10:17:03.359+0200 D WRITE [conn63749] update validate options -- updatedFields: Fields:[ value,] immutableAndSingleValueFields.size:1 fromRepl: 0 validate:1 2015-04-27T10:17:03.359+0200 D STORAGE [conn63749] WT rollback_transaction 2015-04-27T10:17:03.359+0200 D WRITE [conn63749] Caught WriteConflictException doing update on database.collection, attempt: 2 retrying 2015-04-27T10:17:03.359+0200 D STORAGE [conn63749] WT begin_transaction 2015-04-27T10:17:03.359+0200 D WRITE [conn63749] update validate options -- updatedFields: Fields:[ value,] immutableAndSingleValueFields.size:1 fromRepl: 0 validate:1 2015-04-27T10:17:03.359+0200 D STORAGE [conn63749] WT rollback_transaction 2015-04-27T10:17:03.359+0200 D WRITE [conn63749] Caught WriteConflictException doing update on database.collection, attempt: 3 retrying ... 2015-04-27T10:17:03.359+0200 D STORAGE [conn63749] WT begin_transaction 2015-04-27T10:17:03.360+0200 D WRITE [conn63749] update validate options -- updatedFields: Fields:[ value,] immutableAndSingleValueFields.size:1 fromRepl: 0 validate:1 2015-04-27T10:17:03.360+0200 D STORAGE [conn63749] WT rollback_transaction 2015-04-27T10:17:03.360+0200 D WRITE [conn63749] Caught WriteConflictException doing update on database.collection, attempt: 4 retrying ...
Final note
I modified some names (database, collections, fields or indexes). If you want, I can upload full logs or configuration (via SCP) with all the original names. If you require so, maybe we could even plan a full restart of all the processes (that would require planification).
- backports
-
SERVER-75573 WriteConflict errors during findAndModify when zero records found and modified
- Closed
- is related to
-
SERVER-17194 Low Throughput for YCSB 50-50 workload with high client threads
- Closed