Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-21690

Text Search - Performance Regression in 3.2.0 RC4

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.2.0-rc6
    • Affects Version/s: 3.2.0-rc4
    • Component/s: Text Search
    • None
    • Fully Compatible
    • ALL
    • Hide

      See above

      Show
      See above
    • QuInt D (12/14/15)

      MongoDB 3.2.0 RC4 appears to have a substantial performance regression with full text searching

      Test Data
      3000 books obtained from Project Gutenberg (http://web.eecs.umich.edu/~lahiri/gutenberg_dataset.html) stored in MongoDB as follows:

          {
            author : "Abraham Lincoln",
            title : "Letters",
            body : "<full text of book>"
          }
      

      This data was then indexed using an "all fields" index:

      db.books.createIndex( { "$**" : "text" } );
      

      This produces a test dataset of around 1.1GB with a text index of 155MB (measured with WT)

      Test process
      This data was processed into different versions of MongoDB and various simple searches were run using words and phrases of different occurence frequencies in the dataset. This was done using the following, simple query shape in an aggregation pipeline, with the ultimate goal being to report the number of books per author containing the search word:

      db.books.aggregate([
      { 
      	$match : { 
      		$text : { 
      			$search : "house" 
      		} 
      	} 
      },
      { 
      	$group : { 
      		_id : { 
      			author : "$author" 
      		}, 
      		count : { 
      			$sum : 1 
      		} 
      	} 
      }, 
      { 
      	$sort : { 
      		count : -1 
      	} 
      } ]);
      

      The words used are as follows:

      • slaveholder
      • hound
      • "gigantic hound"
      • cheese
      • house

      A simple test script ("testQuery_all.js") is attached to automate this process.

      Test results
      All of these results were taken at the third run (i.e. to ensure that data was as warm as possible). In the case of the 3.2 results, mongod ran one core flat-out for the entire query duration.

      Version Engine Total Query Duration (ms)
      2.6.11 MMAPv1 5308
      3.0.7 MMAPv1 5306
      3.0.7 WT Snappy 6625
      3.2.0 RC4 MMAPv1 26157
      3.2.0 RC4 WT Snappy 639862

      Full results are available here:
      https://goo.gl/s4pU9j

      Source data is here:
      https://dl.dropboxusercontent.com/u/6076108/books.bson.gz
      Note: text index needs to be manually applied to this data:

      db.books.createIndex( { "$**" : "text" } );
      

        1. testQuery_all.js
          1.0 kB
        2. createIndex.js
          0.0 kB

            Assignee:
            rassi J Rassi
            Reporter:
            stuart.hall@masternaut.com Stuart Hall
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: