-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.8.0-rc2
-
Component/s: Performance, Storage
-
ALL
-
- long running bulk insert
during long insert session, saw a few long pause for around 100sec each
here is capture of one from sysbench output (look at the "int ips", which is the last column)
Thread[Thread-2,5,main]430,914,000 inserts : 4,500 seconds : cum ips=95,757.09 : int ips=799.20 Thread[Thread-2,5,main]430,918,000 inserts : 4,505 seconds : cum ips=95,651.60 : int ips=799.20 Thread[Thread-2,5,main]430,925,000 inserts : 4,510 seconds : cum ips=95,547.00 : int ips=1,398.60 Thread[Thread-2,5,main]430,927,000 inserts : 4,515 seconds : cum ips=95,441.51 : int ips=399.52 Thread[Thread-2,5,main]430,928,000 inserts : 4,520 seconds : cum ips=95,336.05 : int ips=199.80 Thread[Thread-2,5,main]430,928,000 inserts : 4,525 seconds : cum ips=95,230.60 : int ips=0.00 Thread[Thread-2,5,main]430,928,000 inserts : 4,530 seconds : cum ips=95,127.49 : int ips=0.00 Thread[Thread-2,5,main]430,929,000 inserts : 4,535 seconds : cum ips=95,022.70 : int ips=199.76 Thread[Thread-2,5,main]430,931,000 inserts : 4,540 seconds : cum ips=94,918.37 : int ips=399.52 Thread[Thread-2,5,main]430,931,000 inserts : 4,545 seconds : cum ips=94,813.82 : int ips=0.00 Thread[Thread-2,5,main]430,931,000 inserts : 4,550 seconds : cum ips=94,709.53 : int ips=0.00 Thread[Thread-2,5,main]430,931,000 inserts : 4,555 seconds : cum ips=94,605.44 : int ips=0.00 Thread[Thread-2,5,main]430,934,000 inserts : 4,560 seconds : cum ips=94,502.24 : int ips=599.28 Thread[Thread-2,5,main]430,934,000 inserts : 4,565 seconds : cum ips=94,398.63 : int ips=0.00 Thread[Thread-2,5,main]430,934,000 inserts : 4,570 seconds : cum ips=94,295.23 : int ips=0.00 Thread[Thread-2,5,main]430,934,000 inserts : 4,575 seconds : cum ips=94,192.05 : int ips=0.00 Thread[Thread-2,5,main]430,934,000 inserts : 4,580 seconds : cum ips=94,089.10 : int ips=0.00 Thread[Thread-2,5,main]430,934,000 inserts : 4,585 seconds : cum ips=93,986.37 : int ips=0.00 Thread[Thread-2,5,main]430,934,000 inserts : 4,590 seconds : cum ips=93,883.89 : int ips=0.00 Thread[Thread-2,5,main]430,934,000 inserts : 4,595 seconds : cum ips=93,781.61 : int ips=0.00 Thread[Thread-2,5,main]430,934,000 inserts : 4,600 seconds : cum ips=93,679.57 : int ips=0.00 Thread[Thread-2,5,main]430,934,000 inserts : 4,605 seconds : cum ips=93,577.76 : int ips=0.00 Thread[Thread-2,5,main]430,934,000 inserts : 4,610 seconds : cum ips=93,476.16 : int ips=0.00 Thread[Thread-2,5,main]430,934,000 inserts : 4,615 seconds : cum ips=93,376.79 : int ips=0.00 Thread[Thread-2,5,main]431,061,000 inserts : 4,620 seconds : cum ips=93,302.52 : int ips=25,223.44 Thread[Thread-2,5,main]431,061,000 inserts : 4,625 seconds : cum ips=93,201.55 : int ips=0.00 Thread[Thread-2,5,main]431,186,000 inserts : 4,630 seconds : cum ips=93,127.80 : int ips=24,975.02 Thread[Thread-2,5,main]431,536,000 inserts : 4,635 seconds : cum ips=93,102.75 : int ips=69,930.07 Thread[Thread-2,5,main]432,127,000 inserts : 4,640 seconds : cum ips=93,129.11 : int ips=117,401.67 Thread[Thread-2,5,main]432,648,000 inserts : 4,645 seconds : cum ips=93,140.91 : int ips=104,075.11 Thread[Thread-2,5,main]433,422,000 inserts : 4,650 seconds : cum ips=93,208.77 : int ips=157,253.15 Thread[Thread-2,5,main]433,977,000 inserts : 4,655 seconds : cum ips=93,227.76 : int ips=110,866.96
observation
- check the CPU flame graph, majority time spend in __ckpt_server & __evict_server, hot functions are (from perf.data.pause_capture1)
+ 40.56% 12239 mongod mongod [.] __wt_tree_walk + 13.59% 4097 mongod mongod [.] __evict_server + 6.09% 1859 mongod mongod [.] __wt_hazard_clear + 3.36% 1952 mongod mongod [.] __wt_row_search + 2.70% 817 mongod mongod [.] __rec_row_leaf + 2.62% 803 mongod mongod [.] __rec_row_leaf_insert + 2.06% 659 mongod mongod [.] __wt_hazard_set + 1.69% 564 mongod mongod [.] snappy::internal::CompressFragment(char const*, unsigned long, char*, unsigned short*, int) + 0.92% 352 mongod mongod [.] __wt_page_in_func + 0.88% 367 mongod mongod [.] __wt_row_modify + 0.84% 508 mongod libc-2.12.so [.] msort_with_tmp + 0.63% 297 mongod libc-2.12.so [.] memcpy + 0.61% 279 mongod mongod [.] tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**) + 0.60% 189 mongod mongod [.] __wt_page_out + 0.59% 344 mongod [kernel.kallsyms] [k] update_curr + 0.51% 665 mongod [kernel.kallsyms] [k] native_write_msr_safe + 0.50% 154 mongod mongod [.] __rec_cell_build_leaf_key
- mongo::PortMessageServer::handleIncomingMsg was pushed to only 0.9%, on normal run, it could take more than 75% of the time.
attachment and details.
attachments:
flame graph
- perf.data.pause_capture1.svg : 10sec perf count during pause
- perf.data.pause_capture2.svg : 20sec perf count during pause
perf.data
- perf.data.pause_capture1.gz
- perf.data.pause_capture2.gz
- perf.report.pause_capture2.txt : report in plain text with expanded stack trace.
- duplicates
-
SERVER-16269 WiredTiger blocks queries and updates for many seconds.
- Closed