Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17317

Deadlock or starvation in WiredTiger from benchmark with 10 insert threads, 1 query thread

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.0.0-rc9
    • Affects Version/s: 3.0.0-rc8
    • Component/s: WiredTiger
    • Fully Compatible
    • ALL

      I have been using iibench-mongodb with 10 insert threads and 0 query threads. Today I tried it with 10 insert threads and 1 query thread, WiredTiger with zlib and no oplog (because turning on the oplog gets https://jira.mongodb.org/browse/SERVER-17141).

      The test was OK for the first 50M inserts (~1800 seconds). Then it goes into a state where over the next 1800 seconds there are only inserts and queries for two 10-second intervals over the next 1800 seconds.

      Thread[Thread-2,5,main]50,394,364 inserts : 1,750 seconds : cum ips=28,795.55 : int ips=10,256.89 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=326.81 : int avg qps=0.00 : exceptions=0
      Thread[Thread-2,5,main]50,516,553 inserts : 1,760 seconds : cum ips=28,701.23 : int ips=12,209.13 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=324.94 : int avg qps=0.00 : exceptions=0
      Thread[Thread-2,5,main]50,608,966 inserts : 1,770 seconds : cum ips=28,591.15 : int ips=9,232.99 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=323.10 : int avg qps=0.00 : exceptions=0
      Thread[Thread-2,5,main]50,725,511 inserts : 1,780 seconds : cum ips=28,495.89 : int ips=11,646.35 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=321.27 : int avg qps=0.00 : exceptions=0
      Thread[Thread-2,5,main]50,801,222 inserts : 1,790 seconds : cum ips=28,380.46 : int ips=7,641.40 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=319.49 : int avg qps=0.00 : exceptions=0
      Thread[Thread-2,5,main]50,900,884 inserts : 1,800 seconds : cum ips=28,278.05 : int ips=9,959.23 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=317.70 : int avg qps=0.00 : exceptions=0
      Thread[Thread-2,5,main]50,905,313 inserts : 1,810 seconds : cum ips=28,124.16 : int ips=442.59 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=315.94 : int avg qps=0.00 : exceptions=0
      Thread[Thread-2,5,main]50,905,313 inserts : 1,820 seconds : cum ips=27,969.52 : int ips=0.00 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=314.20 : int avg qps=0.00 : exceptions=0
      Thread[Thread-2,5,main]50,905,314 inserts : 1,830 seconds : cum ips=27,816.58 : int ips=0.10 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=312.47 : int avg qps=0.00 : exceptions=0
      Thread[Thread-2,5,main]50,905,314 inserts : 1,840 seconds : cum ips=27,665.28 : int ips=0.00 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=310.77 : int avg qps=0.00 : exceptions=0
      Thread[Thread-2,5,main]50,905,314 inserts : 1,850 seconds : cum ips=27,515.64 : int ips=0.00 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=309.08 : int avg qps=0.00 : exceptions=0
      Thread[Thread-2,5,main]50,905,314 inserts : 1,860 seconds : cum ips=27,367.61 : int ips=0.00 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=307.41 : int avg qps=0.00 : exceptions=0
      ... long period with no inserts, no queries
      Thread[Thread-2,5,main]50,905,314 inserts : 2,800 seconds : cum ips=18,179.83 : int ips=0.00 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=203.95 : int avg qps=0.00 : exceptions=10
      Thread[Thread-2,5,main]50,905,314 inserts : 2,810 seconds : cum ips=18,115.72 : int ips=0.00 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=203.23 : int avg qps=0.00 : exceptions=10
      Thread[Thread-2,5,main]50,905,314 inserts : 2,820 seconds : cum ips=18,051.41 : int ips=0.00 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=202.50 : int avg qps=0.00 : exceptions=10
      Thread[Thread-2,5,main]50,905,314 inserts : 2,830 seconds : cum ips=17,987.57 : int ips=0.00 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=201.79 : int avg qps=0.00 : exceptions=10
      Thread[Thread-2,5,main]50,905,314 inserts : 2,840 seconds : cum ips=17,924.18 : int ips=0.00 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=201.07 : int avg qps=0.00 : exceptions=10
      Thread[Thread-2,5,main]51,014,662 inserts : 2,850 seconds : cum ips=17,899.59 : int ips=10,922.78 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=200.37 : int avg qps=0.00 : exceptions=10
      Thread[Thread-2,5,main]51,016,813 inserts : 2,860 seconds : cum ips=17,837.70 : int ips=214.93 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=199.66 : int avg qps=0.00 : exceptions=10
      Thread[Thread-2,5,main]51,016,813 inserts : 2,870 seconds : cum ips=17,775.50 : int ips=0.00 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=198.96 : int avg qps=0.00 : exceptions=10
      Thread[Thread-2,5,main]51,016,813 inserts : 2,880 seconds : cum ips=17,713.74 : int ips=0.00 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=198.27 : int avg qps=0.00 : exceptions=10
      Thread[Thread-2,5,main]51,016,813 inserts : 2,890 seconds : cum ips=17,652.40 : int ips=0.00 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=197.58 : int avg qps=0.00 : exceptions=10
      Thread[Thread-2,5,main]51,016,813 inserts : 2,900 seconds : cum ips=17,591.48 : int ips=0.00 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=196.90 : int avg qps=0.00 : exceptions=10
      ... another long period with no queries, no inserts
      Thread[Thread-2,5,main]51,016,813 inserts : 3,250 seconds : cum ips=15,697.20 : int ips=0.00 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=175.65 : int avg qps=0.00 : exceptions=10
      Thread[Thread-2,5,main]51,016,813 inserts : 3,260 seconds : cum ips=15,649.01 : int ips=0.00 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=175.11 : int avg qps=0.00 : exceptions=10
      Thread[Thread-2,5,main]51,016,813 inserts : 3,270 seconds : cum ips=15,601.11 : int ips=0.00 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=174.57 : int avg qps=0.00 : exceptions=10
      Thread[Thread-2,5,main]51,016,813 inserts : 3,280 seconds : cum ips=15,553.51 : int ips=0.00 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=174.04 : int avg qps=0.00 : exceptions=10
      Thread[Thread-2,5,main]51,016,813 inserts : 3,290 seconds : cum ips=15,506.21 : int ips=0.00 : cum avg qry=2.11 : int avg qry=0.00 : cum avg qps=173.51 : int avg qps=0.00 : exceptions=10
      ... and still no queries, no inserts
      

      Prior to the two 10-second intervals when there was a brief amount of throughput, the common thread stacks are:

           14 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__wt_cache_wait,__wt_cache_full_check,__cursor_enter,__curfile_enter,__cursor_func_init,__wt_btcur_insert,__curfile_insert,mongo::WiredTigerRecordStore::insertRecord,mongo::Collection::_insertDocument,mongo::Collection::insertDocument,singleInsert,insertOne,mongo::WriteBatchExecutor::execOneInsert,mongo::WriteBatchExecutor::execInserts,mongo::WriteBatchExecutor::bulkExecute,mongo::WriteBatchExecutor::executeBatch,mongo::WriteCmd::run,mongo::_execCommand,mongo::Command::execCommand,mongo::_runCommands,runCommands,mongo::runQuery,receivedQuery,mongo::assembleResponse,mongo::MyMessageHandler::process,mongo::PortMessageServer::handleIncomingMsg,start_thread,clone
            4 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__wt_cache_wait,__wt_cache_full_check,__cursor_enter,__curfile_enter,__cursor_func_init,__wt_btcur_insert,__curfile_insert,mongo::WiredTigerIndexStandard::_insert,mongo::WiredTigerIndex::insert,mongo::BtreeBasedAccessMethod::insert,mongo::IndexCatalog::_indexRecord,mongo::IndexCatalog::indexRecord,mongo::Collection::_insertDocument,mongo::Collection::insertDocument,singleInsert,insertOne,mongo::WriteBatchExecutor::execOneInsert,mongo::WriteBatchExecutor::execInserts,mongo::WriteBatchExecutor::bulkExecute,mongo::WriteBatchExecutor::executeBatch,mongo::WriteCmd::run,mongo::_execCommand,mongo::Command::execCommand,mongo::_runCommands,runCommands,mongo::runQuery,receivedQuery,mongo::assembleResponse,mongo::MyMessageHandler::process,mongo::PortMessageServer::handleIncomingMsg,start_thread,clone
            2 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__wt_cache_wait,__wt_cache_full_check,__cursor_enter,__curfile_enter,__cursor_func_init,__wt_btcur_insert,__curfile_insert,mongo::WiredTigerIndexUnique::_insert,mongo::WiredTigerIndex::insert,mongo::BtreeBasedAccessMethod::insert,mongo::IndexCatalog::_indexRecord,mongo::IndexCatalog::indexRecord,mongo::Collection::_insertDocument,mongo::Collection::insertDocument,singleInsert,insertOne,mongo::WriteBatchExecutor::execOneInsert,mongo::WriteBatchExecutor::execInserts,mongo::WriteBatchExecutor::bulkExecute,mongo::WriteBatchExecutor::executeBatch,mongo::WriteCmd::run,mongo::_execCommand,mongo::Command::execCommand,mongo::_runCommands,runCommands,mongo::runQuery,receivedQuery,mongo::assembleResponse,mongo::MyMessageHandler::process,mongo::PortMessageServer::handleIncomingMsg,start_thread,clone
            1 sigwait,mongo::(anonymous,boost::(anonymous,start_thread,clone
            1 select,__wt_sleep,__evict_pass,__evict_server,start_thread,clone
            1 select,mongo::Listener::initAndListen,_initAndListen,mongo::initAndListen,mongoDbMain,main
            1 recv,mongo::Socket::_recv,mongo::Socket::unsafe_recv,mongo::Socket::recv,mongo::MessagingPort::recv,mongo::PortMessageServer::handleIncomingMsg,start_thread,clone
            1 pthread_cond_wait@@GLIBC_2.3.2,wait<boost::unique_lock<boost::timed_mutex>,mongo::DeadlineMonitor<mongo::V8Scope>::deadlineMonitorThread,boost::(anonymous,start_thread,clone
            1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__wt_cache_wait,__wt_cache_full_check,__wt_page_in_func,__wt_page_swap_func,__wt_row_search,__cursor_row_search,__wt_btcur_search,__curfile_search,mongo::WiredTigerRecordStore::findRecord,mongo::KVCatalog::_findEntry,mongo::KVCatalog::getMetaData,mongo::KVCollectionCatalogEntry::_getMetaData,mongo::BSONCollectionCatalogEntry::getAllIndexes,mongo::TTLMonitor::getTTLIndexesForDB,mongo::TTLMonitor::doTTLPass,mongo::TTLMonitor::run,mongo::BackgroundJob::jobBody,boost::(anonymous,start_thread,clone
            1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__wt_cache_wait,__wt_cache_full_check,__cursor_enter,__curfile_enter,__cursor_func_init,__wt_btcur_insert,__curfile_insert,mongo::WiredTigerSizeStorer::storeInto,mongo::WiredTigerKVEngine::syncSizeInfo,mongo::WiredTigerKVEngine::haveDropsQueued,mongo::WiredTigerSessionCache::releaseSession,mongo::WiredTigerRecoveryUnit::~WiredTigerRecoveryUnit,mongo::WiredTigerRecoveryUnit::~WiredTigerRecoveryUnit,mongo::MyMessageHandler::process,mongo::PortMessageServer::handleIncomingMsg,start_thread,clone
            1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__sweep_server,start_thread,clone
            1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__log_server,start_thread,clone
            1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__log_close_server,start_thread,clone
            1 pthread_cond_timedwait@@GLIBC_2.3.2,timed_wait<boost::unique_lock<boost::timed_mutex>,timed_wait<boost::unique_lock<boost::timed_mutex>,,mongo::RangeDeleter::doWork,boost::(anonymous,start_thread,clone
            1 pthread_cond_timedwait@@GLIBC_2.3.2,boost::condition_variable_any::timed_wait<boost::unique_lock<boost::timed_mutex>,timed_wait<boost::unique_lock<boost::timed_mutex>,,timed_wait<boost::unique_lock<boost::timed_mutex>,,mongo::(anonymous,mongo::BackgroundJob::jobBody,boost::(anonymous,start_thread,clone
            1 nanosleep,mongo::sleepsecs,mongo::ClientCursorMonitor::run,mongo::BackgroundJob::jobBody,boost::(anonymous,start_thread,clone
            1 free,__wt_free_int,__free_skip_list,__free_skip_array,__free_page_row_leaf,__wt_page_out,__wt_ref_out,__wt_split_rewrite,__evict_page_dirty_update,__wt_evict,__wt_evict_page,__wt_page_release_evict,__wt_page_in_func,__wt_page_swap_func,__wt_row_search,__cursor_row_search,__wt_btcur_search,__curfile_search,mongo::WiredTigerRecordStore::dataFor,mongo::Collection::docFor,mongo::FetchStage::work,mongo::ProjectionStage::work,mongo::MultiPlanStage::workAllPlans,mongo::MultiPlanStage::pickBestPlan,mongo::SubplanStage::choosePlanForSubqueries,mongo::SubplanStage::pickBestPlan,mongo::PlanExecutor::pickBestPlan,mongo::PlanExecutor::make,mongo::PlanExecutor::make,mongo::getExecutor,mongo::runQuery,receivedQuery,mongo::assembleResponse,mongo::MyMessageHandler::process,mongo::PortMessageServer::handleIncomingMsg,start_thread,clone
            1 __evict_lru_pages,__evict_worker,start_thread,clone
            1 __evict_list_clear,__evict_get_ref,__wt_evict_lru_page,__evict_lru_pages,__evict_worker,start_thread,clone
            1 __evict_get_ref,__wt_evict_lru_page,__evict_lru_pages,__evict_worker,start_thread,clone
            1 deflate_slow,deflate,zlib_compress_raw,__rec_split_raw_worker,__rec_split_raw,__rec_row_leaf_insert,__rec_row_leaf,__wt_reconcile,__sync_file,__wt_cache_op,__checkpoint_write_leaves,__checkpoint_apply,__wt_txn_checkpoint,__session_checkpoint,__ckpt_server,start_thread,clone
      

      Now, after the 2 10-second intervals with throughput the thread stacks are listed here. I also ran "db.serverStatus()" and that hung so one of the threads below is from it. But it hung after producing output that I will upload.

            5 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__wt_cache_wait,__wt_cache_full_check,__cursor_enter,__curfile_enter,__cursor_func_init,__wt_btcur_insert,__curfile_insert,mongo::WiredTigerRecordStore::insertRecord,mongo::Collection::_insertDocument,mongo::Collection::insertDocument,singleInsert,insertOne,mongo::WriteBatchExecutor::execOneInsert,mongo::WriteBatchExecutor::execInserts,mongo::WriteBatchExecutor::bulkExecute,mongo::WriteBatchExecutor::executeBatch,mongo::WriteCmd::run,mongo::_execCommand,mongo::Command::execCommand,mongo::_runCommands,runCommands,mongo::runQuery,receivedQuery,mongo::assembleResponse,mongo::MyMessageHandler::process,mongo::PortMessageServer::handleIncomingMsg,start_thread,clone
            3 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__wt_cache_wait,__wt_cache_full_check,__cursor_enter,__curfile_enter,__cursor_func_init,__wt_btcur_insert,__curfile_insert,mongo::WiredTigerIndexUnique::_insert,mongo::WiredTigerIndex::insert,mongo::BtreeBasedAccessMethod::insert,mongo::IndexCatalog::_indexRecord,mongo::IndexCatalog::indexRecord,mongo::Collection::_insertDocument,mongo::Collection::insertDocument,singleInsert,insertOne,mongo::WriteBatchExecutor::execOneInsert,mongo::WriteBatchExecutor::execInserts,mongo::WriteBatchExecutor::bulkExecute,mongo::WriteBatchExecutor::executeBatch,mongo::WriteCmd::run,mongo::_execCommand,mongo::Command::execCommand,mongo::_runCommands,runCommands,mongo::runQuery,receivedQuery,mongo::assembleResponse,mongo::MyMessageHandler::process,mongo::PortMessageServer::handleIncomingMsg,start_thread,clone
            2 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__wt_cache_wait,__wt_cache_full_check,__cursor_enter,__curfile_enter,__cursor_func_init,__wt_btcur_insert,__curfile_insert,mongo::WiredTigerIndexStandard::_insert,mongo::WiredTigerIndex::insert,mongo::BtreeBasedAccessMethod::insert,mongo::IndexCatalog::_indexRecord,mongo::IndexCatalog::indexRecord,mongo::Collection::_insertDocument,mongo::Collection::insertDocument,singleInsert,insertOne,mongo::WriteBatchExecutor::execOneInsert,mongo::WriteBatchExecutor::execInserts,mongo::WriteBatchExecutor::bulkExecute,mongo::WriteBatchExecutor::executeBatch,mongo::WriteCmd::run,mongo::_execCommand,mongo::Command::execCommand,mongo::_runCommands,runCommands,mongo::runQuery,receivedQuery,mongo::assembleResponse,mongo::MyMessageHandler::process,mongo::PortMessageServer::handleIncomingMsg,start_thread,clone
            1 __wt_evict_lru_page,__evict_lru_pages,__evict_worker,start_thread,clone
            1 sigwait,mongo::(anonymous,boost::(anonymous,start_thread,clone
            1 select,mongo::Listener::initAndListen,_initAndListen,mongo::initAndListen,mongoDbMain,main
            1 recv,mongo::Socket::_recv,mongo::Socket::unsafe_recv,mongo::Socket::recv,mongo::MessagingPort::recv,mongo::PortMessageServer::handleIncomingMsg,start_thread,clone
            1 pthread_cond_wait@@GLIBC_2.3.2,wait<boost::unique_lock<boost::timed_mutex>,mongo::DeadlineMonitor<mongo::V8Scope>::deadlineMonitorThread,boost::(anonymous,start_thread,clone
            1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__wt_cache_wait,__wt_cache_full_check,__cursor_enter,__curfile_enter,__cursor_func_init,__wt_btcur_search,__curfile_search,mongo::WiredTigerRecordStore::findRecord,mongo::KVCatalog::_findEntry,mongo::KVCatalog::getMetaData,mongo::KVCollectionCatalogEntry::_getMetaData,mongo::BSONCollectionCatalogEntry::getAllIndexes,mongo::TTLMonitor::getTTLIndexesForDB,mongo::TTLMonitor::doTTLPass,mongo::TTLMonitor::run,mongo::BackgroundJob::jobBody,boost::(anonymous,start_thread,clone
            1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__wt_cache_wait,__wt_cache_full_check,__cursor_enter,__curfile_enter,__cursor_func_init,__wt_btcur_insert,__curfile_insert,mongo::WiredTigerSizeStorer::storeInto,mongo::WiredTigerKVEngine::syncSizeInfo,mongo::WiredTigerKVEngine::haveDropsQueued,mongo::WiredTigerSessionCache::releaseSession,mongo::WiredTigerRecoveryUnit::~WiredTigerRecoveryUnit,mongo::WiredTigerRecoveryUnit::~WiredTigerRecoveryUnit,mongo::MyMessageHandler::process,mongo::PortMessageServer::handleIncomingMsg,start_thread,clone
            1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__sweep_server,start_thread,clone
            1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__log_close_server,start_thread,clone
            1 pthread_cond_timedwait@@GLIBC_2.3.2,__wt_cond_wait,__evict_server,start_thread,clone
            1 pthread_cond_timedwait@@GLIBC_2.3.2,timed_wait<boost::unique_lock<boost::timed_mutex>,timed_wait<boost::unique_lock<boost::timed_mutex>,,mongo::RangeDeleter::doWork,boost::(anonymous,start_thread,clone
            1 pthread_cond_timedwait@@GLIBC_2.3.2,boost::condition_variable_any::timed_wait<boost::unique_lock<boost::timed_mutex>,timed_wait<boost::unique_lock<boost::timed_mutex>,,timed_wait<boost::unique_lock<boost::timed_mutex>,,mongo::(anonymous,mongo::BackgroundJob::jobBody,boost::(anonymous,start_thread,clone
            1 nanosleep,mongo::sleepsecs,mongo::ClientCursorMonitor::run,mongo::BackgroundJob::jobBody,boost::(anonymous,start_thread,clone
            1 __lll_lock_wait,pthread_mutex_lock,??,??,??
            1 __evict_worker,start_thread,clone
            1 deflate_slow,deflate,zlib_compress_raw,__rec_split_raw_worker,__rec_split_raw,__rec_row_leaf_insert,__rec_row_leaf,__wt_reconcile,__sync_file,__wt_cache_op,__checkpoint_worker,__wt_checkpoint,__checkpoint_apply,__wt_txn_checkpoint,__session_checkpoint,__ckpt_server,start_thread,clone
      

        1. jmongoiibench.java
          35 kB
        2. jmongoiibench.java
          35 kB
        3. mongo.conf
          0.7 kB
        4. o.s.1
          13 kB
        5. run.simple.bash
          4 kB

            Assignee:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Reporter:
            mdcallag Mark Callaghan
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: