Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-5714

Nicer behavior of dbstats call on database with large nssize

    • Type: Icon: New Feature New Feature
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.0.4, 2.1.0
    • Component/s: MMAPv1, Performance, Storage
    • None
    • Storage Execution

      If one has a database with thousands of collections (each with a few indexes), it has a very large nssize. The docs indicate that this might be a problem when they say

      Command takes some time to run, typically a few seconds unless the .ns file is very large (via use of --nssize). While running other operations may be blocked.

      It's not obvious from this statement, but if you run dbstats on a database with a very large nssize, it can literally take you database offline for minutes our hours as it did in our environment:

      {
      "opid" : 1711160082,
      "active" : true,
      "lockType" : "read",
      "waitingForLock" : false,
      "secs_running" : 882,
      "op" : "query",
      "ns" : "gryphon",
      "query" :
      Unknown macro:

      Unknown macro: { "dbstats" }

      ,
      "client" : "10.1.45.2:54395",
      "desc" : "conn",
      "threadId" : "0x7df7b4a04710",
      "connectionId" : 3450907,
      "numYields" : 0
      },

      One of our devs added this query to part of our database browser not realizing the impact it would have on this particular database. When he browsed to our production database, it took services down for 15 minutes (882 seconds at the time we read that log entry).

      We ran into this problem when we launched MMS against our servers. We've been banned from using MMS as a result.

      There should be a way to avoid these situations at the database level (rather than by patching MMS, patching our client apps, and patching each and every developer to remember not to invoke this operation).

      A few options:

      • Add a configuration parameter or flag to disable dbstats on a particular database or an entire MongoDB instance.
      • Rewrite dbstats to not hold the lock for so long (and to fail if it takes more than a configurable amount of time).
      • Rewrite dbstats to require a parameter to allow it to run on a database with a nssize of a certain size or if it runs for too long (something like dangerous=True).

      If one could protect our database from this, it would mean we can prevent this situation from other inadvertent or intentional DoS in the future, regardless of where the request comes from.

            Assignee:
            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            jason.coombs@yougov.com Jason R. Coombs
            Votes:
            3 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: