If an application opens multiple statistics cursors in parallel WiredTiger can end up deadlocked between the DHANDLE and SCHEMA locks.
Example stack traces - full stack traces are available in JIRA SERVER-16738 ticket.
Thread 1:
WT-2 0x00007ffff7bc6480 in __GI___pthread_mutex_lock (mutex=0x3420400) at ../nptl/pthread_mutex_lock.c:79 WT-3 0x0000000001ef5262 in __wt_spin_lock (session=0x3547280, t=0x3420400) at src/third_party/wiredtiger/src/include/mutex.i:175 WT-4 0x0000000001ef6000 in __wt_session_get_btree (session=0x3547280, uri=0x1593d000b "file:index-491659-4306617738107441063.wt", checkpoint=0x0, cfg=0x7ffff0bacec0, flags=8) at src/third_party/wiredtiger/src/session/session_dhandle.c:397 WT-5 0x0000000001ef58c0 in __wt_session_get_btree_ckpt (session=0x3547280, uri=0x1593d000b "file:index-491659-4306617738107441063.wt", cfg=0x7ffff0bacec0, flags=0) at src/third_party/wiredtiger/src/session/session_dhandle.c:229 WT-6 0x0000000001e9a2da in __curstat_file_init (session=0x3547280, uri=0x1593d000b "file:index-491659-4306617738107441063.wt", cfg=0x7ffff0bacec0, cst=0xf441be00) at src/third_party/wiredtiger/src/cursor/cur_stat.c:379
Thread 2:
WT-3 0x0000000001ef5262 in __wt_spin_lock (session=0x35459c0, t=0x34204c0) at src/third_party/wiredtiger/src/include/mutex.i:175 WT-4 0x0000000001ef5f93 in __wt_session_get_btree (session=0x35459c0, uri=0x141baf9f0 "file:collection-491658-4306617738107441063.wt", checkpoint=0x0, cfg=0x0, flags=8) at src/third_party/wiredtiger/src/session/session_dhandle.c:397 WT-5 0x0000000001e7cb70 in __conn_btree_apply_internal (session=0x35459c0, dhandle=0x1041edc00, func=0x1e9a1fb <__curstat_checkpoint>, cfg=0x7ffff13b4b20) at src/third_party/wiredtiger/src/conn/conn_dhandle.c:484 WT-6 0x0000000001e7cd26 in __wt_conn_btree_apply (session=0x35459c0, apply_checkpoints=1, uri=0x141baf9f0 "file:collection-491658-4306617738107441063.wt", func=0x1e9a1fb <__curstat_checkpoint>, cfg=0x7ffff13b4b20) at src/third_party/wiredtiger/src/conn/conn_dhandle.c:526 WT-7 0x0000000001e9a4a1 in __curstat_file_init (session=0x35459c0, uri=0x172e0c00b "file:collection-491658-4306617738107441063.wt", cfg=0x7ffff13b4ef0, cst=0xf62ac000) at src/third_party/wiredtiger/src/cursor/cur_stat.c:413
The sequence of events leading up to the deadlock is:
- Thread 2 grabs the DHANDLE lock in *curstat_file_init before calling *wt_conn_btree_apply.
- Thread 1 doesn't hold the DHANDLE lock and calls session_get_btree, which grabs the SCHEMA lock, then waits on the DHANDLE lock that is held by thread 2.
- Thread 2 ends up in *wt_session_get_btree while already holding the DHANDLE lock. *wt_session_get_btree attempts to get the SCHEMA lock which is held by thread 1.
Thread 2 is the "problem" thread - all other operations that need both the SCHEMA and DHANDLE locks get the SCHEMA lock first. It's probably enough to change __curstat_file_init so that it takes both the SCHEMA and DHANDLE locks at the start.
- related to
-
WT-2 What does metadata look like?
- Closed
-
WT-3 What file formats are required?
- Closed
-
WT-4 Flexible cursor traversals
- Closed
-
WT-5 How does pget work: is it necessary?
- Closed
-
WT-6 Complex schema example
- Closed
-
WT-7 Do we need the handle->err/errx methods?
- Closed
-
WT-1576 Fix a deadlock opening statistics cursors.
- Closed