Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Text Search
Labels:
- qi-text-search
- query-44-grooming

Assigned Teams:

Query Integration
Operating System:
ALL
Steps To Reproduce:
Hide

sputnik-rs:PRIMARY> db.ftstest.insert( { data1 : "abcd", data2 : "efgh", data3 : "ijkl" } ) WriteResult({ "nInserted" : 1 }) sputnik-rs:PRIMARY> db.ftstest.createIndex( { data1 : "text", data2 : "text", data3 : "text" } ) { "createdCollectionAutomatically" : false, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 } sputnik-rs:PRIMARY> db.ftstest.find( { $text : { $search : '"abcd" "efgh" "ijkl"' } } ) { "_id" : ObjectId("54fff32bb43016d00d95734a"), "data1" : "abcd", "data2" : "efgh", "data3" : "ijkl" } sputnik-rs:PRIMARY> db.ftstest.find( { $text : { $search : '"abc" "efg" "ijk"' } } ) sputnik-rs:PRIMARY> db.ftstest.find( { $text : { $search : '"abcd" "efg" "ijk"' } } ) { "_id" : ObjectId("54fff32bb43016d00d95734a"), "data1" : "abcd", "data2" : "efgh", "data3" : "ijkl" } sputnik-rs:PRIMARY> db.ftstest.find( { $text : { $search : '"bcd" "fgh" "jkl"' } } ) sputnik-rs:PRIMARY> db.ftstest.find( { $text : { $search : '"abcd" "fgh" "jkl"' } } ) { "_id" : ObjectId("54fff32bb43016d00d95734a"), "data1" : "abcd", "data2" : "efgh", "data3" : "ijkl" } sputnik-rs:PRIMARY>
Show
sputnik-rs:PRIMARY> db.ftstest.insert( { data1 : "abcd" , data2 : "efgh" , data3 : "ijkl" } ) WriteResult({ "nInserted" : 1 }) sputnik-rs:PRIMARY> db.ftstest.createIndex( { data1 : "text" , data2 : "text" , data3 : "text" } ) { "createdCollectionAutomatically" : false , "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 } sputnik-rs:PRIMARY> db.ftstest.find( { $text : { $search : ' "abcd" "efgh" "ijkl" ' } } ) { "_id" : ObjectId( "54fff32bb43016d00d95734a" ), "data1" : "abcd" , "data2" : "efgh" , "data3" : "ijkl" } sputnik-rs:PRIMARY> db.ftstest.find( { $text : { $search : ' "abc" "efg" "ijk" ' } } ) sputnik-rs:PRIMARY> db.ftstest.find( { $text : { $search : ' "abcd" "efg" "ijk" ' } } ) { "_id" : ObjectId( "54fff32bb43016d00d95734a" ), "data1" : "abcd" , "data2" : "efgh" , "data3" : "ijkl" } sputnik-rs:PRIMARY> db.ftstest.find( { $text : { $search : ' "bcd" "fgh" "jkl" ' } } ) sputnik-rs:PRIMARY> db.ftstest.find( { $text : { $search : ' "abcd" "fgh" "jkl" ' } } ) { "_id" : ObjectId( "54fff32bb43016d00d95734a" ), "data1" : "abcd" , "data2" : "efgh" , "data3" : "ijkl" } sputnik-rs:PRIMARY>

The following behavior with FTS seems inconsistent:

sputnik-rs:PRIMARY> db.ftstest.insert( { data1 : "abcd", data2 : "efgh", data3 : "ijkl" } )
WriteResult({ "nInserted" : 1 })
sputnik-rs:PRIMARY> db.ftstest.createIndex( { data1 : "text", data2 : "text", data3 : "text" } )
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
sputnik-rs:PRIMARY> db.ftstest.find( { $text : { $search : '"abcd" "efgh" "ijkl"' } } )
{ "_id" : ObjectId("54fff32bb43016d00d95734a"), "data1" : "abcd", "data2" : "efgh", "data3" : "ijkl" }
sputnik-rs:PRIMARY> db.ftstest.find( { $text : { $search : '"abc" "efg" "ijk"' } } )
sputnik-rs:PRIMARY> db.ftstest.find( { $text : { $search : '"abcd" "efg" "ijk"' } } )
{ "_id" : ObjectId("54fff32bb43016d00d95734a"), "data1" : "abcd", "data2" : "efgh", "data3" : "ijkl" }
sputnik-rs:PRIMARY> db.ftstest.find( { $text : { $search : '"bcd" "fgh" "jkl"' } } )
sputnik-rs:PRIMARY> db.ftstest.find( { $text : { $search : '"abcd" "fgh" "jkl"' } } )
{ "_id" : ObjectId("54fff32bb43016d00d95734a"), "data1" : "abcd", "data2" : "efgh", "data3" : "ijkl" }
sputnik-rs:PRIMARY>

What happens above:

In the first query, all search words match and the document is returned. This is expected.
In the second query, we removed the last letter of each word. As a result they no longer match the full words and nothing is returned. This is expected.
In the third query, the first word is again the full 4 letter word, but the two others are part of the word. Since this is an AND search, this should return an empty set, but returns the document because the second and third words match part of the words in the document.
In the fifth and sixth queries the same is demonstrated when removing the first letter of all or some words respectively.

It seems to me that when scanning the index, MongoDB will match for full words (post stemming). This is expected. However, for documents found from the index scan a filtering step is executed, which actually matches parts of words.

Without looking at the code, I recognize that this is a common error when using regular expression libraries. For matching full words the syntax ^abcd$ should be used, but a developer may easily forget that and just search for abcd, which will match any strings that includes abcd as a part of itself.

Assignee:: [DO NOT USE] Backlog - Query Integration

Reporter:: Henrik Ingo (Inactive)

Participants:: [DO NOT USE] Backlog - Query Integration, David Storch, Henrik Ingo

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: Mar 11 2015 08:04:23 AM UTC

Updated:: Dec 28 2023 06:34:20 PM UTC

Details

Description

Attachments

Activity

People

Dates