-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: mongoimport
-
None
-
2
-
Needed
-
Options documentation would need to be updated
-
(copied to CRM)
Revised
As we convert the tools to the new Go driver, the original PR will not apply. Instead, we'll implement higher-performance bulk insert/update built on the new Go driver bulk API, including the higher batch size limit.
The request to add a "Remove" mode has been pulled out to TOOLS-2268 for separate triage
Original
The below changes were implemented after consulting with our Mongo rep Anant Srivastava to meet internal implementation needs. I will be opening a pull request shortly with our changes for review in case some/all of these changes want to be rolled into the product.
Bulk upserts
Enable bulk upsert operations. In the live version of mongoimport, running in upsert mode limits to 1 insertion worker process and an effective batch size of 1. This results in performance that unfortunately rendered mongoimport not viable for our volumes. With the addition of bulk, multi-worker upserts, we are seeing a 400-700X performance boost. With this performance tweak, mongoimport became a viable tool for our update process.
--bulkUpdate command line option added. When toggled on, upserts can be executed in bulk and in multiple worker processes. This option was added to limit the impact to existing processes using mongoimport. There is some debate on whether this flag is necessary or if 'bulkUpdate' mode should be 'on' by default and toggled 'off' via the --maintainInsertionOrder option
The change for 'bulkUpdate' upsert mode was implemented through disabling maintainInsertionOrder, removing the restriction for 1 insertion worker and adding new method to BufferedBulkInserter to support bulk Upsert operations.
Remove mode
--mode remove option added. Will construct bson selectors using records from input file and --upsertFields to remove matching documents. Each selector will remove only a single matching document. Implemented through adding new method to BufferedBulkInserter to support bulk Remove operations.
--upsertFields are required when specifying this option.
batchSize limit increased from 1k to 100k
With the MongoDB 3.6 batch size limit changes, the --batchSize option's maximum was raised to 100k documents. Mongoimport and mongo driver code (gopkg.in/mgo.v2) were patched to support this. Specifying a batch size larger than 1000 and targeting MongoDB <3.6 results in operations being batched driver side in chunks of 1000. The driver was also patched to split write operations >16MB into separate writeOpCommand calls for *insertOp, bulkUpdateOp, and bulkDeleteOp operation types.
https://docs.mongodb.com/manual/reference/limits/#Write-Command-Batch-Limit-Size
- depends on
-
GODRIVER-975 bulk write doesn't report write concern error
- Closed
- is duplicated by
-
TOOLS-1963 mongoimport --mode=upsert doesn't do bulk upsert
- Closed
-
TOOLS-1677 Mongoimport --mode merge dramatically slower than --mode insert
- Closed
- is related to
-
TOOLS-2465 mongoimport mode upsert failed process bulk write error
- Closed
-
TOOLS-2268 Add remove mode to mongoimport
- Closed
- related to
-
TOOLS-2875 Limit the BufferedBulkInserter's batch size by bytes
- Closed
- links to