Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.4.3
Component/s: Sharding
Labels:
None
Environment:
Ubuntu 12.04.1 LTS
3.2.0-32-generic #51-Ubuntu SMP Wed Sep 26 21:33:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
MongoDB 2.4.3

Operating System:
ALL

We started to shard one more of our big collections in our database. Database has 26 collections and some of them are already sharded.
Now every night (UTC) we let the balancer run:

{ "_id" : "balancer", "activeWindow" : { "start" : "18:00", "stop" : "7:00" }, "stopped" : false }

The collection we now added has around 140 mio documents.
"avgObjSize" : 378.40800250149164,
"size" : 52424250472,

What we now see, is that outside the Balancer window the homeshard is doing its cleanup rounds.

Thus we see a lot writes and reads via mongotop on this collection.
We profiled the access patterns and think that >80% of the writes are coming from the cleanup job.

some output dbtop (webinterface) for this collection:

total		Reads		Writes		Queries		GetMores		Inserts		Updates		Removes
2259	84.9%	1987	49.9%	272	34.9%	682	37.9%	5	2.7%	0	0%	40	8.3%	0	0%

2320	84.1%	1479	47.9%	841	36.3%	530	28.9%	3	11.3%	0	0%	6	0.2%	0	0%

In the logfile of the server process (primary) we find the following entry:

Tue Aug 27 15:08:25.610 [cleanupOldData-5219670bedeed3fdea9d337b] moveChunk starting delete for: database.CollectionToshard from { targetUid: -5232965359423252304 } -> { targetUid: -5219148617130848963 }
....
Tue Aug 27 15:32:58.264 [cleanupOldData-5219670bedeed3fdea9d337b] Helpers::removeRangeUnlocked time spent waiting for replication: 526999ms
Tue Aug 27 15:32:58.264 [cleanupOldData-5219670bedeed3fdea9d337b] moveChunk deleted 92419 documents for database.CollectionToshard from { targetUid: -5232965359423252304 } -> { targetUid: -5219148617130848963 }

Every cleanup deletes around 90k documents in ~24 minutes. This is very slow and we suffer from periodic high IO writes. During these high IO writes the mongod service is slow and we queue up reads and some writes (monitored via mongostat)

Is this cleanUp job so aggressive for the IO?
Why is this cleanup not done while the balancer runs?
Is there a way to check the status of this cleanup job?
Is there a way to limit the cleanup job performance?

Thanks in advance,
Steffen

Assignee:: Unassigned

Reporter:: Steffen

Participants:: Daniel Pasette, David Storch, ning, Steffen

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: Aug 27 2013 03:47:40 PM UTC

Updated:: Jul 11 2016 05:40:24 PM UTC

Resolved:: Nov 07 2013 06:11:44 PM UTC

Details

Description

Attachments

Activity

People

Dates