Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Works as Designed
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 1.6.0, 1.6.1
Component/s: None
Labels:
None
Environment:
Ubuntu 18 / x64
Mongo server 4.0.14

Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

We noticed our oplogs on some clusters were spending orders of magnitude more GB/hour despite similar traffic and lower scale; all are on the same mongod version, using same host OS platform, and all were similar hardware, but we found that the clusters that were impacted were all running the PHP Extension 1.6.0 and the cluster that was fine was still on 1.5.5.

As an experiment we upgraded the known-good cluster to 1.6.1 and the change to the op-log was dramatic – we went from under 1GB/hr to close to 100GB/hr before we hastily down-graded. See attached screen-capture from the monitoring software.

Convinced we had found the difference we down-graded the 1.6.0 clusters as well and saw the average GB/hr slowly start to dwindle; I think it'll take quite some time for that average to settle so posting a picture won't be as exciting just yet.

Preliminary investigation suggested that the oplog was being flooded with "no-op" records containing the entire document updated:

{
    "ts" : Timestamp(1578321176, 5387),
    "t" : NumberLong(7),
    "h" : NumberLong("6027878811455725704"),
    "v" : 2,
    "op" : "n",
    "ns" : "lotus.accounts",
    "ui" : UUID("30e2da0f-6bbc-49e0-8578-f681ce6a6bfa"),
    "wall" : ISODate("2020-01-06T14:32:56.689Z"),
    "lsid" : {
        "id" : UUID("53985818-7a52-4ba3-8545-dc00bb6d240b"),
        "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=")
    },
    "txnNumber" : NumberLong(312101),
    "stmtId" : 0,
    "prevOpTime" : {
        "ts" : Timestamp(0, 0),
        "t" : NumberLong(-1)
    },
    "o" : { ..... gigantic json block ..... }
}

I can't exactly send you the oplog because it has client-data in it but since I have 300GB of backlog that should be written at a few hundred mb/hr I might be able to fish for clues before it expires.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

1.6.1.png
18 kB
Jan 06 2020 08:19:30 PM UTC
PS4.png
22 kB
Jan 07 2020 01:39:04 PM UTC

related to

SERVER-45442 Mitigate oplog impact for findAndModify commands executed with retryWrites=true

Closed

Assignee:: Jeremy Mikola
Reporter:: Glen Miner
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Jan 06 2020 08:25:18 PM UTC
Updated:: Oct 27 2023 01:00:10 PM UTC
Resolved:: Jan 09 2020 08:53:39 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates