Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 3.2.0-rc6
Affects Version/s: 3.2.0-rc4
Component/s: Text Search
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Steps To Reproduce:

Hide

See above

Show
See above
Sprint:
QuInt D (12/14/15)
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

MongoDB 3.2.0 RC4 appears to have a substantial performance regression with full text searching

Test Data
3000 books obtained from Project Gutenberg (http://web.eecs.umich.edu/~lahiri/gutenberg_dataset.html) stored in MongoDB as follows:

    {
      author : "Abraham Lincoln",
      title : "Letters",
      body : "<full text of book>"
    }

This data was then indexed using an "all fields" index:

db.books.createIndex( { "$**" : "text" } );

This produces a test dataset of around 1.1GB with a text index of 155MB (measured with WT)

Test process
This data was processed into different versions of MongoDB and various simple searches were run using words and phrases of different occurence frequencies in the dataset. This was done using the following, simple query shape in an aggregation pipeline, with the ultimate goal being to report the number of books per author containing the search word:

db.books.aggregate([
{ 
	$match : { 
		$text : { 
			$search : "house" 
		} 
	} 
},
{ 
	$group : { 
		_id : { 
			author : "$author" 
		}, 
		count : { 
			$sum : 1 
		} 
	} 
}, 
{ 
	$sort : { 
		count : -1 
	} 
} ]);

The words used are as follows:

slaveholder
hound
"gigantic hound"
cheese
house

A simple test script ("testQuery_all.js") is attached to automate this process.

Test results
All of these results were taken at the third run (i.e. to ensure that data was as warm as possible). In the case of the 3.2 results, mongod ran one core flat-out for the entire query duration.

Version	Engine	Total Query Duration (ms)
2.6.11	MMAPv1	5308
3.0.7	MMAPv1	5306
3.0.7	WT Snappy	6625
3.2.0 RC4	MMAPv1	26157
3.2.0 RC4	WT Snappy	639862

Full results are available here:
https://goo.gl/s4pU9j

Source data is here:
https://dl.dropboxusercontent.com/u/6076108/books.bson.gz
Note: text index needs to be manually applied to this data:

db.books.createIndex( { "$**" : "text" } );

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

testQuery_all.js
1.0 kB
Nov 30 2015 10:09:07 AM UTC
createIndex.js
0.0 kB
Nov 30 2015 10:09:07 AM UTC

is related to

SERVER-19936 Performance pass on unicode-aware text processing logic (text index v3)

Closed

Assignee:: J Rassi (Inactive)

Reporter:: Stuart Hall

Participants:: Githook User, J Rassi, Martin Bligh, Ramon Fernandez Marina, Stuart Hall

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Created:: Nov 30 2015 10:09:06 AM UTC

Updated:: Dec 03 2015 10:29:31 PM UTC

Resolved:: Dec 01 2015 10:41:46 PM UTC

GA Target Date:: None

Public Preview Target Date:: None

Private Preview Target Date:: None

Experiment Target Date:: None

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates