Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-12416

Statistics cursor can be used to return size of arbitrary files

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: None
    • Storage Engines
    • StorEng - Refinement Pipeline

      Opening a data source statistics cursor with the statistics=(size) config option provides a way to return the size of a WiredTiger file without needing to open the corresponding dhandle. Basically it calls through the block manager and file system layers and retrieves the file size via the stat() system call.

      A potential bug here is that this code does not validate that the data source name is, in fact, a file managed by WiredTiger. This Python example illustrates the issue by opening a a statistics cursor to return the size of bin/ls.

      >>> stat=s.open_cursor('statistics:file:/bin/ls', None, "statistics=(size)")
      >>> stat[wiredtiger.stat.dsrc.block_size]
      ['block-manager: file size in bytes', '133792', 133792]

      It is not clear to me if this is a bug, or "works as intended". I also don't see a major downside to this behavior.

      The fix would be to validate the uri argument when opening the cursor. We can do this by looking it up in the metadata file. This adds to the overhead of the cursor, but is considerably less expensive than opening the uri, which is what we are trying to avoid. (See SERVER-17018).

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            keith.smith@mongodb.com Keith Smith
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: