Add counter metrics for global lock queue operations to serverStatus.

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Storage Execution
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      Current gauge metrics (serverstatus_globallock_currentqueue_readers/writers) miss short-lived queue spikes between monitoring intervals (5-minute pings).
      We need counter metrics that track cumulative operations added to queues since server start to calculate rates over arbitrary time periods.

      For pre-7.0 versions, the calculation (serverStatus.globalLock.currentQueue.reads + serverStatus.globalLock.activeClients.readers - serverStatus.wiredTiger.concurrentTransactions.read.out) is used to determine queue size, but this still only provides point-in-time values.

      Proposed Solution is to add two new counters:

      1. serverStatus.globalLock.totalQueuedReaders
      2. serverStatus.globalLock.totalQueuedWriters

      These would increment when operations join respective queues and reset only on node restart, allowing rate calculations between arbitrary timestamps.

      Use Case
      Detect system overload conditions by tracking queue growth rates, especially short-lived spikes that occur between regular monitoring intervals. 5-minute monitoring intervals can miss significant queue spikes.

            Assignee:
            Matt Panton
            Reporter:
            Garaudy Etienne
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: