Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Won't Fix
Priority: Critical - P2
Fix Version/s: None
Affects Version/s: 3.0.8
Component/s: WiredTiger
Labels:
- RF
- WTplaybook
Environment:
Mongo 3.0.8 + parallel

Operating System:
ALL
Sprint:
Integrate+Tuning 15 (06/03/16)
Confidence Status:
None
Work Order:
0

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Using Mongo 3.0.8 when replicating large data set (this did'nt happen in smaller one) with max_threads = 4 and a patch for mongo which parallelize the cloning process, in our setup we used 17 threads cloning different dbs, each thread is holding a db lock (instead of a global lock)

examining with gdb the cache flags shows STUCK bit is lit, here's the
cache structure:

{
  bytes_inmem = 60122154727,
  pages_inmem = 5668935,
  bytes_internal = 243665765,
  bytes_overflow = 0,
  bytes_evict = 486975238126,
  pages_evict = 5648345,
  bytes_dirty = 59789008836,
  pages_dirty = 6448,
  bytes_read = 1482481072,
  evict_max_page_size = 31232046,
  read_gen = 1682697,
  read_gen_oldest = 1682790,
  evict_cond = 0x39abcd0,
  evict_lock = {
    lock = {
      __data = {
        __lock = 0,
        __count = 0,
        __owner = 0,
        __nusers = 0,
        __kind = 0,
        __spins = 0,
        __elision = 0,
        __list = {
          __prev = 0x0,
          __next = 0x0
        }
      },
      __size = '\000' <repeats 39 times>,
      __align = 0
    },
    counter = 0,
    name = 0x17446c2 "cache eviction",
    id = 0 '\000',
    initialized = 1 '\001'
  },
  evict_walk_lock = {
    lock = {
      __data = {
        __lock = 0,
        __count = 0,
        __owner = 0,
        __nusers = 0,
        __kind = 0,
        __spins = 0,
        __elision = 0,
        __list = {
          __prev = 0x0,
          __next = 0x0
        }
      },
      __size = '\000' <repeats 39 times>,
      __align = 0
    },
    counter = 0,
    name = 0x17446d1 "cache walk",
    id = 0 '\000',
    initialized = 1 '\001'
  },
  evict_waiter_cond = 0x39abd40,
  eviction_trigger = 95,
  eviction_target = 80,
  eviction_dirty_target = 80,
  overhead_pct = 8,
  evict = 0x4bd4000,
  evict_current = 0x0,
  evict_candidates = 100,
  evict_entries = 100,
  evict_max = 400,
  evict_slots = 400,
  evict_file_next = 0x570f9c700,
  sync_request = 0,
  sync_complete = 0,
  cp_saved_read = 0,
  cp_current_read = 0,
  cp_skip_count = 0,
  cp_reserved = 0,
  cp_session = 0x0,
  cp_tid = 0,
  flags = 40
}

attached is a stack trace, as you can see all cloning threads are hung on the eviction condition "0x39abd40" (threads 51 through 67) which comes from __wt_cache_full_check() call
Thread #8 also stuck on the same call, due to _deleteExcessDocuments call.
The eviction server (Thread #2) is sleeping, and this happens constantly
the eviction workers seem to have no work as there are 3 live eviction workers (threads 68 through 70) all of which are waiting on the same condition

This situation reproduced itself over and over at some point during the initial clone, any idea as to why this happens would be great.
The small patch for the parallelization is available here:
https://github.com/liranms/mongo/commit/a216bb0d8159f8030b5d666ffa8869c57f28fcc0

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

22062_pinned_high.png
26 kB
Feb 04 2016 03:43:14 AM UTC
22062_pinned_low.png
28 kB
Feb 04 2016 03:43:14 AM UTC
another_stack_trace
134 kB
Dec 30 2015 08:38:55 AM UTC
fg_index_hang.png
31 kB
Feb 01 2016 04:17:51 AM UTC
fg_index_no_hang.png
37 kB
Feb 01 2016 04:17:51 AM UTC
sslog.log.gz
2.75 MB
Dec 30 2015 08:38:55 AM UTC
stacktrace.txt
130 kB
Dec 29 2015 03:10:36 PM UTC
timeseries.png
151 kB
Jan 06 2016 12:37:21 PM UTC

is duplicated by

SERVER-21616 WiredTiger hangs when mongorestoring 2.8TB data

Closed

is related to

SERVER-18844 Reacquire the snapshot after commit/abort

Closed

Assignee:: Ramon Fernandez Marina
Reporter:: Liran Moysi
Participants:: Alexander Gorrod, Bruce Lucas, Eric Milkie, Keith Bostic, Liran Moysi, Michael Cahill, Ramon Fernandez Marina, Roy Reznik, Roy Reznik
Votes:: 1 Vote for this issue
Watchers:: 26 Start watching this issue

Created:: Dec 29 2015 03:10:35 PM UTC
Updated:: May 22 2016 07:22:16 PM UTC
Resolved:: May 18 2016 02:50:34 AM UTC
Confidence Status Last Update:: 06/May/16 5:52 PM

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates