Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-37849

Poor replication performance and cache-full hang on secondary due to pinned content

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Storage
    • None
    • Storage Engines
    • ALL

      This is a follow-on to SERVER-33191. The repro in that ticket updated large documents by incrementing a field; this repro is similar except that it updates the document by pushing a value onto an array. The symptoms of the latter use case disappeared in 3.6.7, but this use case still has similar bad symptoms in 3.6.8: the secondary takes much longer to apply the updates than the primary, and eventually hangs with the cache full.

      • A-B: secondary member 2 is stopped while some updates are applied
      • B-C: secondary member 2 is restarted and works to catch up but applies the updates much more slowly than the primary and other secondary did due to SERVER-34938.
      • C: secondary is now hung waiting for cache
        #1  0x000055bb44ea3f39 in __wt_cond_wait_signal ()
        #2  0x000055bb44e8cfca in __wt_cache_eviction_worker ()
        #3  0x000055bb44ee553c in __wt_txn_commit ()
        #4  0x000055bb44ecf28b in __session_commit_transaction ()
        #5  0x000055bb44e5ff88 in mongo::WiredTigerRecoveryUnit::_txnClose(bool) ()
        #6  0x000055bb44e6045a in mongo::WiredTigerRecoveryUnit::_commit() ()
        #7  0x000055bb44e1178a in mongo::WriteUnitOfWork::commit() ()
        #8  0x000055bb452dfd76 in mongo::repl::applyOperation_inlock(mongo::OperationContext*, mongo::Database*, mongo::BSONObj const&, bool, mongo::repl::OplogApplication::Mode, std::function<void ()>)::{lambda()#15}::operator()() const ()
        

        including the checkpoint thread

        #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
        #1  0x000055bb44ea3f39 in __wt_cond_wait_signal ()
        #2  0x000055bb44e8cfca in __wt_cache_eviction_worker ()
        #3  0x000055bb44ecfa68 in __session_begin_transaction ()
        #4  0x000055bb44e65e96 in mongo::WiredTigerSizeStorer::syncCache(bool) ()
        #5  0x000055bb44e450b9 in mongo::WiredTigerKVEngine::syncSizeInfo(bool) const ()
        #6  0x000055bb44e488d5 in mongo::WiredTigerKVEngine::haveDropsQueued() const ()
        #7  0x000055bb44e627f6 in mongo::WiredTigerSessionCache::releaseSession(mongo::WiredTigerSession*) ()
        #8  0x000055bb44e4eb54 in mongo::WiredTigerKVEngine::WiredTigerCheckpointThread::run() ()
        

        1. hang.png
          hang.png
          268 kB
        2. hang8.png
          hang8.png
          216 kB
        3. hang8.tar
          544 kB
        4. repro-10MBx2-push.sh
          1 kB
        5. repro-10MBx8-push.sh
          2 kB
        6. repro-10MBx8-push-fast.sh
          2 kB

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            bruce.lucas@mongodb.com Bruce Lucas (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            16 Start watching this issue

              Created:
              Updated: