Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-10630

Speed of cleanupOldData while chunk balancing

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.4.3
    • Component/s: Sharding
    • None
    • Environment:
      Ubuntu 12.04.1 LTS
      3.2.0-32-generic #51-Ubuntu SMP Wed Sep 26 21:33:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
      MongoDB 2.4.3
    • ALL

      We started to shard one more of our big collections in our database. Database has 26 collections and some of them are already sharded.
      Now every night (UTC) we let the balancer run:

      { "_id" : "balancer", "activeWindow" : { "start" : "18:00", "stop" : "7:00" }, "stopped" : false }
      

      The collection we now added has around 140 mio documents.
      "avgObjSize" : 378.40800250149164,
      "size" : 52424250472,

      What we now see, is that outside the Balancer window the homeshard is doing its cleanup rounds.

      Thus we see a lot writes and reads via mongotop on this collection.
      We profiled the access patterns and think that >80% of the writes are coming from the cleanup job.

      some output dbtop (webinterface) for this collection:

      total		Reads		Writes		Queries		GetMores		Inserts		Updates		Removes
      2259	84.9%	1987	49.9%	272	34.9%	682	37.9%	5	2.7%	0	0%	40	8.3%	0	0%
      
      2320	84.1%	1479	47.9%	841	36.3%	530	28.9%	3	11.3%	0	0%	6	0.2%	0	0%
      

      In the logfile of the server process (primary) we find the following entry:

      Tue Aug 27 15:08:25.610 [cleanupOldData-5219670bedeed3fdea9d337b] moveChunk starting delete for: database.CollectionToshard from { targetUid: -5232965359423252304 } -> { targetUid: -5219148617130848963 }
      ....
      Tue Aug 27 15:32:58.264 [cleanupOldData-5219670bedeed3fdea9d337b] Helpers::removeRangeUnlocked time spent waiting for replication: 526999ms
      Tue Aug 27 15:32:58.264 [cleanupOldData-5219670bedeed3fdea9d337b] moveChunk deleted 92419 documents for database.CollectionToshard from { targetUid: -5232965359423252304 } -> { targetUid: -5219148617130848963 }
      

      Every cleanup deletes around 90k documents in ~24 minutes. This is very slow and we suffer from periodic high IO writes. During these high IO writes the mongod service is slow and we queue up reads and some writes (monitored via mongostat)

      Is this cleanUp job so aggressive for the IO?
      Why is this cleanup not done while the balancer runs?
      Is there a way to check the status of this cleanup job?
      Is there a way to limit the cleanup job performance?

      Thanks in advance,
      Steffen

            Assignee:
            Unassigned Unassigned
            Reporter:
            steffen Steffen
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: