-
Type: Bug
-
Resolution: Won't Fix
-
Priority: Critical - P2
-
None
-
Affects Version/s: 3.0.8
-
Component/s: WiredTiger
-
Environment:Mongo 3.0.8 + parallel
-
ALL
-
Integrate+Tuning 15 (06/03/16)
Using Mongo 3.0.8 when replicating large data set (this did'nt happen in smaller one) with max_threads = 4 and a patch for mongo which parallelize the cloning process, in our setup we used 17 threads cloning different dbs, each thread is holding a db lock (instead of a global lock)
examining with gdb the cache flags shows STUCK bit is lit, here's the
cache structure:
{ bytes_inmem = 60122154727, pages_inmem = 5668935, bytes_internal = 243665765, bytes_overflow = 0, bytes_evict = 486975238126, pages_evict = 5648345, bytes_dirty = 59789008836, pages_dirty = 6448, bytes_read = 1482481072, evict_max_page_size = 31232046, read_gen = 1682697, read_gen_oldest = 1682790, evict_cond = 0x39abcd0, evict_lock = { lock = { __data = { __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = { __prev = 0x0, __next = 0x0 } }, __size = '\000' <repeats 39 times>, __align = 0 }, counter = 0, name = 0x17446c2 "cache eviction", id = 0 '\000', initialized = 1 '\001' }, evict_walk_lock = { lock = { __data = { __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = { __prev = 0x0, __next = 0x0 } }, __size = '\000' <repeats 39 times>, __align = 0 }, counter = 0, name = 0x17446d1 "cache walk", id = 0 '\000', initialized = 1 '\001' }, evict_waiter_cond = 0x39abd40, eviction_trigger = 95, eviction_target = 80, eviction_dirty_target = 80, overhead_pct = 8, evict = 0x4bd4000, evict_current = 0x0, evict_candidates = 100, evict_entries = 100, evict_max = 400, evict_slots = 400, evict_file_next = 0x570f9c700, sync_request = 0, sync_complete = 0, cp_saved_read = 0, cp_current_read = 0, cp_skip_count = 0, cp_reserved = 0, cp_session = 0x0, cp_tid = 0, flags = 40 }
attached is a stack trace, as you can see all cloning threads are hung on the eviction condition "0x39abd40" (threads 51 through 67) which comes from __wt_cache_full_check() call
Thread #8 also stuck on the same call, due to _deleteExcessDocuments call.
The eviction server (Thread #2) is sleeping, and this happens constantly
the eviction workers seem to have no work as there are 3 live eviction workers (threads 68 through 70) all of which are waiting on the same condition
This situation reproduced itself over and over at some point during the initial clone, any idea as to why this happens would be great.
The small patch for the parallelization is available here:
https://github.com/liranms/mongo/commit/a216bb0d8159f8030b5d666ffa8869c57f28fcc0
- is duplicated by
-
SERVER-21616 WiredTiger hangs when mongorestoring 2.8TB data
- Closed
- is related to
-
SERVER-18844 Reacquire the snapshot after commit/abort
- Closed