Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Critical - P2
Fix Version/s: 3.0.4, 3.1.4
Affects Version/s: 3.0.2, 3.0.3, 3.1.3
Component/s: WiredTiger
Labels:
- ET

Backwards Compatibility:
Fully Compatible
Operating System:
Linux
Backport Completed:

3.0.4
Sprint:
Quint Iteration 4
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Issue Status as of Jun 09, 2015

ISSUE SUMMARY
MongoDB with WiredTiger may produce WriteConflict exceptions under the following conditions:

when doing bulk or multi updates, and
these updates have upsert=true set, and
no documents match the update condition, and
these operations use a unique index on the updated field

USER IMPACT
These exceptions cause the insert part of the write operation to fail and an error is returned to the application.

WORKAROUNDS
Retry failed write operations as inserts at the application level.

AFFECTED VERSIONS
MongoDB 3.0.0 to 3.0.3 with WiredTiger.

FIX VERSION
The fix is included in the 3.0.4 production release.

Original description

Intro

We have a pretty complex python application running over Apache Storm 0.9.2, MongoEngine 0.8.7, PyMongo 2.8, and MongoDB 3.0.2. Our Mongo setup is a sharding cluster (3 mongos, 3 mongo-config and 5 ReplicaSets) and all ReplicaSets are running with WiredTiger.

With a very high load of concurrent updates, I get a lot of exceptions at application level:

OperationError: Update failed (exception: WriteConflict)

And looking into the mongod logs:

2015-04-22T16:24:02.100+0200 I WRITE    [conn81979] Had WriteConflict during multi update, aborting
2015-04-22T16:24:04.907+0200 I WRITE    [conn75303] Had WriteConflict during multi update, aborting
2015-04-22T16:24:18.538+0200 I WRITE    [conn81979] Had WriteConflict during multi update, aborting
2015-04-22T16:24:27.539+0200 I WRITE    [conn75526] Had WriteConflict during multi update, aborting
2015-04-22T16:24:34.413+0200 I WRITE    [conn81979] Had WriteConflict during multi update, aborting
2015-04-22T16:24:34.455+0200 I WRITE    [conn77036] Had WriteConflict during multi update, aborting
2015-04-22T16:24:46.693+0200 I WRITE    [conn81979] Had WriteConflict during multi update, aborting

I've mitigated the issue with a try/except during the update operation and re-doing it in the except. Seems to work fine almost all the time, but sometimes it fails twice in a row.

I've asked this same issue in StackOverflow (http://stackoverflow.com/questions/29800823/mongodb-with-wiredtiger-lots-of-writeconflict) and in the mongodb-user mailing list. There, Asya (thanks!) asked me to open this ticket.

The update operation that fails with WriteConflict

With MongoEngine we hace created a model with a custom QuerySet that simply does a bunch of:

self.update(inc__valueX=CONSTANT * ammount, upsert=True)

Where ammount is 99% of the time 1.0.

What the mongo server really does (what this traduces to) is:

db.collection.update({ s: ObjectId('53cd0cdac2f4ed5f76527f9a'), u: ObjectId('553df07f996d7110b3a2843b') }, { $inc: { value: VALUE } }, multi: true, upsert: true })

Where VALUE is the result of (CONSTANT * ammount).

The

db.collection.stats()

are attached.

Current setup

Currently we have 3 mongos, 3 mongo-config (running on the same servers as the mongos) and 5 replicaSet, each one with 3 members. Mongo-config are running with MMAPv1 and replicaSets with WiredTiger. All of them are running this MongoDB version:

2015-04-10T10:34:06.213+0200 I CONTROL  [initandlisten] MongoDB starting : pid=21243 port=27017 dbpath=/opt/mongodb 64-bit host=bb07
2015-04-10T10:34:06.213+0200 I CONTROL  [initandlisten] db version v3.0.2
2015-04-10T10:34:06.213+0200 I CONTROL  [initandlisten] git version: 6201872043ecbbc0a4cc169b5482dcf385fc464f
2015-04-10T10:34:06.213+0200 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014
2015-04-10T10:34:06.213+0200 I CONTROL  [initandlisten] build info: Linux ip-10-229-1-2 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 x86_64 BOOST_LIB_	VERSION=1_49
2015-04-10T10:34:06.213+0200 I CONTROL  [initandlisten] allocator: tcmalloc
2015-04-10T10:34:06.213+0200 I CONTROL  [initandlisten] options: { config: "/etc/mongod.conf", net: { http: { enabled: false } }, replication: { replSet: "rs01" }, storage: { dbPath: "/opt/mongodb", engine: "wiredTiger" }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongod.log" } }

> mongos --version
MongoS version 3.0.2 starting: pid=19084 port=27017 64-bit host=bb01 (--help for usage)
git version: 6201872043ecbbc0a4cc169b5482dcf385fc464f
OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014
build sys info: Linux ip-10-229-1-2 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49

The Apache Storm machines (the ones performing the update() operations, run on different servers and same OS version. The whole setup is running on top of VMware 5.5.

Logs

After running a

db.adminCommand({setParameter:1, logLevel:5})

in one of the primarys and waiting for the error to reproduce, this is the log output:

2015-04-27T10:17:03.359+0200 D COMMAND  [conn63749] run command database.$cmd { update: "collection", upda
tes: [ { q: { s: ObjectId('53cd0cdac2f4ed5f76527f9a'), u: ObjectId('553df07f996d7110b3a2843b') }
, u: { $inc: { value: 0.0328 } }, multi: true, upsert: true } ], ordered: true, metadata: { shardName: "b
brs03", shardVersion: [ Timestamp 856000|4, ObjectId('552f8fbfeaa225ff8d70a7dd') ], session: 0 } }
2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Tagging the match expression according to cach
e data: 
Filter:
$and
    s == ObjectId('53cd0cdac2f4ed5f76527f9a')
    u == ObjectId('553df07f996d7110b3a2843b')
Cache data:
(index-tagged expression tree: tree=Node
---Leaf { u: 1, s: 1 }, pos: 1
---Leaf { u: 1, s: 1 }, pos: 0
)
2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Index 0: { _id: 1 }
2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Index 1: { u: 1, s: 1 }
2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Index 2: { u: 1, valueX: -1 }
2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Index 3: { u: 1, valueY: -1 }
2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Index 4: { u: 1, value: -1 }
2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Index 5: { u: 1, valueZ: -1 }
2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Tagged tree:
$and
    u == ObjectId('553df07f996d7110b3a2843b')  || Selected Index #1 pos 0
    s == ObjectId('53cd0cdac2f4ed5f76527f9a')  || Selected Index #1 pos 1
2015-04-27T10:17:03.359+0200 D QUERY    [conn63749] [QLOG] Planner: solution constructed from the cache:
FETCH
---fetched = 1
---sortedByDiskLoc = 1
---getSort = [{}, { s: 1 }, { u: 1 }, { u: 1, s: 1 }, ]
---Child:
------IXSCAN
---------keyPattern = { u: 1, s: 1 }
---------direction = 1
---------bounds = field #0['u']: [ObjectId('553df07f996d7110b3a2843b'), ObjectId('553df07f996d7110b3a2843b')], field #1['s']: [ObjectId('53cd0cdac2f4ed5f76527f9a'), ObjectId('53cd0cdac2f4ed5f76527f9a')]
---------fetched = 0
---------sortedByDiskLoc = 1
---------getSort = [{}, { s: 1 }, { u: 1 }, { u: 1, s: 1 }, ]

2015-04-27T10:17:03.359+0200 D STORAGE  [conn63749] WT begin_transaction
2015-04-27T10:17:03.359+0200 D WRITE    [conn63749] update validate options --  updatedFields: Fields:[ value,] immutableAndSingleValueFields.size:1 fromRepl: 0 validate:1
2015-04-27T10:17:03.359+0200 D STORAGE  [conn63749] WT rollback_transaction
2015-04-27T10:17:03.359+0200 D WRITE    [conn63749] Caught WriteConflictException doing update on database.collection, attempt: 2 retrying
2015-04-27T10:17:03.359+0200 D STORAGE  [conn63749] WT begin_transaction
2015-04-27T10:17:03.359+0200 D WRITE    [conn63749] update validate options --  updatedFields: Fields:[ value,] immutableAndSingleValueFields.size:1 fromRepl: 0 validate:1
2015-04-27T10:17:03.359+0200 D STORAGE  [conn63749] WT rollback_transaction
2015-04-27T10:17:03.359+0200 D WRITE    [conn63749] Caught WriteConflictException doing update on database.collection, attempt: 3 retrying
...
2015-04-27T10:17:03.359+0200 D STORAGE  [conn63749] WT begin_transaction
2015-04-27T10:17:03.360+0200 D WRITE    [conn63749] update validate options --  updatedFields: Fields:[ value,] immutableAndSingleValueFields.size:1 fromRepl: 0 validate:1
2015-04-27T10:17:03.360+0200 D STORAGE  [conn63749] WT rollback_transaction
2015-04-27T10:17:03.360+0200 D WRITE    [conn63749] Caught WriteConflictException doing update on database.collection, attempt: 4 retrying
...

Final note

I modified some names (database, collections, fields or indexes). If you want, I can upload full logs or configuration (via SCP) with all the original names. If you require so, maybe we could even plan a full restart of all the processes (that would require planification).

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

dbstats.txt
28 kB
Apr 27 2015 08:33:38 AM UTC
mongodb.conf
0.7 kB
May 27 2015 04:31:50 PM UTC
mongod-log.txt
263 kB
Apr 28 2015 04:13:36 PM UTC

backports

SERVER-75573 WriteConflict errors during findAndModify when zero records found and modified

Closed

is related to

SERVER-17194 Low Throughput for YCSB 50-50 workload with high client threads

Closed

Assignee:: Geert Bosch

Reporter:: Alex Sobrino

Participants:: Alex Sobrino, Asya Kamsky, Chad Kreimendahl, David Daly, Eliot Horowitz, Geert Bosch, Githook User, Julien Thebault, Ultrabug

Votes:: 7 Vote for this issue

Watchers:: 28 Start watching this issue

Created:: Apr 27 2015 08:33:38 AM UTC

Updated:: Apr 01 2023 10:03:25 PM UTC

Resolved:: Jun 03 2015 03:13:11 AM UTC

Details

Description

Original description

Intro

The update operation that fails with WriteConflict

Current setup

Logs

Final note

Attachments

Attachments

Issue Links

Activity

People

Dates