Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-21616

WiredTiger hangs when mongorestoring 2.8TB data

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: None
    • Component/s: Admin, WiredTiger
    • ALL

      When mongorestoring about 2.8 TB of data split across 4 mongo dibs of almost equal size, mongorestore gets stuck once two of the 4 db restores moves into index restore phase. Around the same time RAM was max utilized and there was no OOM.

      Configuration:
      Mongodb version-3.0.7 with wired tiger storage engine
      server configuration, (4 core, 32GB RAM, Linux xxxhost 3.13.0-44-generic #73-Ubuntu SMP Tue Dec 16 00:22:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux)

      Trace of mongod process that seems to be spinning 100% cpu is:

      #0  longest_match (s=s@entry=0xb3441000, cur_match=49810)
          at src/third_party/zlib-1.2.8/deflate.c:1159
      #1  0x00000000013f7043 in deflate_slow (s=0xb3441000, flush=2)
          at src/third_party/zlib-1.2.8/deflate.c:1771
      #2  0x00000000013f8f92 in deflate (strm=strm@entry=0x7fb4fc0b4240, flush=flush@entry=2)
          at src/third_party/zlib-1.2.8/deflate.c:903
      #3  0x00000000012d9b84 in zlib_compress_raw (compressor=0x29d5100, session=0x2d1f8c0,
          page_max=<optimized out>, split_pct=<optimized out>, extra=<optimized out>,
          src=<optimized out>, offsets=0x1d5c2000, slots=59,
          dst=0x1b546040 "x\234\354[[o\033E\024\236\070m\323\004h\242B\313\v\bh\201\212JF\276\304I\fH\220u\n\255\224ʦNh\373\200\242\265=I\226ػ\226\275\216H\244B\021\022\002\t(\027\tqS\371\a\224\313\003\022\277", dst_len=324707, final=0, result_lenp=0x7fb4fc0b43c0, result_slotsp=0x7fb4fc0b43a0)
          at src/third_party/wiredtiger/ext/compressors/zlib/zlib_compress.c:284
      #4  0x0000000001372580 in __rec_split_raw_worker (session=session@entry=0x2d1f8c0,
          r=r@entry=0x231cbe00, next_len=3864, no_more_rows=no_more_rows@entry=false)
          at src/third_party/wiredtiger/src/reconcile/rec_write.c:2353
      #5  0x0000000001374695 in __rec_split_raw (next_len=<optimized out>, r=0x231cbe00,
          session=0x2d1f8c0) at src/third_party/wiredtiger/src/reconcile/rec_write.c:2617
      #6  __rec_row_leaf_insert (session=session@entry=0x2d1f8c0, r=r@entry=0x231cbe00,
          ins=<optimized out>) at src/third_party/wiredtiger/src/reconcile/rec_write.c:4744
      #7  0x00000000013769b4 in __rec_row_leaf (session=session@entry=0x2d1f8c0, r=r@entry=0x231cbe00,
          page=page@entry=0x455187f20, salvage=salvage@entry=0x0)
          at src/third_party/wiredtiger/src/reconcile/rec_write.c:4366
      #8  0x000000000137860d in __wt_reconcile (session=session@entry=0x2d1f8c0, ref=0x3c9fa2f60,
          salvage=salvage@entry=0x0, flags=flags@entry=0)
          at src/third_party/wiredtiger/src/reconcile/rec_write.c:441
      #9  0x000000000130cfb5 in __sync_file (syncop=16, session=0x2d1f8c0)
          at src/third_party/wiredtiger/src/btree/bt_sync.c:77
      #10 __wt_cache_op (session=session@entry=0x2d1f8c0, ckptbase=ckptbase@entry=0x0, op=op@entry=16)
          at src/third_party/wiredtiger/src/btree/bt_sync.c:269
      #11 0x0000000001399d4a in __checkpoint_write_leaves (cfg=0x7fb4fc0b4a30, session=0x2d1f8c0)
          at src/third_party/wiredtiger/src/txn/txn_ckpt.c:277
      #12 __checkpoint_apply (op=0x13984a0 <__checkpoint_write_leaves>, cfg=0x7fb4fc0b4a30,
          session=0x2d1f8c0) at src/third_party/wiredtiger/src/txn/txn_ckpt.c:184
      #13 __wt_txn_checkpoint (session=session@entry=0x2d1f8c0, cfg=cfg@entry=0x7fb4fc0b4a30)
          at src/third_party/wiredtiger/src/txn/txn_ckpt.c:407
      #14 0x000000000138d2f6 in __session_checkpoint (wt_session=0x2d1f8c0, config=<optimized out>)
          at src/third_party/wiredtiger/src/session/session_api.c:955
      #15 0x0000000001325a5a in __ckpt_server (arg=0x2d1f8c0)
          at src/third_party/wiredtiger/src/conn/conn_ckpt.c:95
      #16 0x00007fb5006ef182 in start_thread (arg=0x7fb4fc0b5700) at pthread_create.c:312
      #17 0x00007fb4ff1b647d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      

      from mongod.log file, the following error line seems pertinent:

      Caught WriteConflictException doing insert on dbx.collectiony, attempt: 1 retrying
      

      Marking it as P2 as workaround observed is to serially mongorestore databases with each sized at 700GB approx.

            Assignee:
            ramon.fernandez@mongodb.com Ramon Fernandez Marina
            Reporter:
            guruditta guruditta golani
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: