-
Type: Bug
-
Resolution: Cannot Reproduce
-
Priority: Major - P3
-
None
-
Affects Version/s: 1.8.1
-
Component/s: Index Maintenance, Replication
-
Environment:Ubuntu server 10.04.2 LTS
12 GB ram
2 x raid 1 volumes, data on one volume, journal on another.
2 x 6 core 2.9 GHz cpu with hyperthreading (24 logical cpus)
-
Linux
When restoring a large collection on a mongodb server with a replicated set configuration, index creation becomes incredibly slow. In our case, the collection index creation could take anywhere from between 20 minutes to over half an hour
The exact same collection restored to a test mongodb server running inside a virtual machine with much smaller specs, but with no replication sets, the same indexes are created in 5 to 6 seconds.
Examing the log files, we can see the following:
// log file from big server, replication sets, secondary server on
Wed Jun 1 03:37:31 [conn16] building new index onUnknown macro: { apps.aid}for cassia.snapshots_2011053100_01
239000/326021 73%
259900/326021 79%
268900/326021 82%
277600/326021 85%
287400/326021 88%
289600/326021 88%
300000/326021 92%
310000/326021 95%
311900/326021 95%
314200/326021 96%
316500/326021 97%
318400/326021 97%
320500/326021 98%
322600/326021 98%
324900/326021 99%
Wed Jun 1 03:40:28 [conn16] done building bottom layer, going to commit
Wed Jun 1 03:40:37 [conn16] done for 1824 records 185.837secs
Wed Jun 1 03:40:37 [conn16] insert cassia.system.indexes 185837ms
Wed Jun 1 03:40:37 [dur] lsn set 79905899
Wed Jun 1 03:40:37 [conn16] building new index onUnknown macro: { cnt}for cassia.snapshots_2011053100_02
Wed Jun 1 03:40:39 [conn16] done for 1823 records 2.376secs
Wed Jun 1 03:40:39 [conn16] insert cassia.system.indexes 2376ms
Wed Jun 1 03:40:39 [conn16] building new index onUnknown macro: { apps.aid}for cassia.snapshots_2011053100_02
100/1823 5%
200/1823 10%
300/1823 16%
1000/1823 54%
1100/1823 60%
1200/1823 65%
1300/1823 71%
1800/1823 98%
Wed Jun 1 03:54:09 [conn16] external sort used : 170 files in 809 secs
100/325822 0%
126600/325822 38%
128200/325822 39%
129500/325822 39%
130200/325822 39%
131300/325822 40%
131900/325822 40%
132900/325822 40%
133600/325822 41%
134100/325822 41%
134900/325822 41%
135900/325822 41%
136800/325822 41%
137900/325822 42%
138800/325822 42%
139900/325822 42%
140700/325822 43%
141400/325822 43%
141900/325822 43%
142900/325822 43%
143800/325822 44%
144900/325822 44%
145800/325822 44%
146700/325822 45%
148300/325822 45%
149800/325822 45%
150200/325822 46%
305200/325822 93%
Wed Jun 1 03:59:40 [conn16] done building bottom layer, going to commit
Wed Jun 1 03:59:40 [conn16] done for 1823 records 1140.97secs
Wed Jun 1 03:59:40 [conn16] insert cassia.system.indexes 1140969ms
Wed Jun 1 03:59:40 [conn113] getmore cassia.system.namespaces cid:6678415019539336828 getMore: {} bytes:77862 nreturned:1396 1355345ms
Wed Jun 1 03:59:40 [conn99] query cassia.snapshots_2011060101_02 ntoreturn:1 reslen:44018 nscanned:1Unknown macro: { cnt}nreturned:1 1143466ms
Wed Jun 1 03:59:40 [dur] lsn set 81033909
The log files from the small server, single mongo, no replicated sets shows the following:
Wed Jun 1 03:53:35 [conn2] building new index on
Unknown macro: { apps.aid}for cassia.snapshots_2011053100_02
Wed Jun 1 03:53:41 [conn2] done for 1823 records 5.612secs
As a test, I shutdown the secondary server while the restore was happening and the index creation time on the primary server when down to a much more acceptable 5 to 6 seconds.
More datapoints:
- using the db.currentOp() command shows the index creation as "external sort used" on the primary server with the secondary server active
- that also appears as "external sort used : 170 files in 809 secs" in the log file
- after shutting down the secondary server, the "external sort used" goes away
- "external sorts" never appear in the log files on the same data set in the test database.
- the slow performance occurs whether I use the mongorestore command or the db.copyDatabase("old collection", "new collection", "old server") command