-
Type: New Feature
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Text Search
-
Query Integration
-
(copied to CRM)
Add Korean to languages supported in MongoDB FTS.
Original description:
First of all, MongoDB support stemming for major language like english.
But there's no stemming for CJK (Especially I am focusing on Korean). So MongoDB text search is useless for korean language unless stemming Korean in application code.
I am not sure you are interested in Korean,
Anyway Korean use only suffix(postpositional word) after stem(base word) like ..
Stem : 한글 With suffix : 한글은, 한글이, 한글을, 한글과, 한글도, 한글처럼, ...
But current MongoDB implementation, MongoDB search exact match with search term. So Korean word does not matched because of suffix("은", "는", "이", "가", "처럼", ...)
So if MongoDB support range search for text search like below example, We (Korean) can use text-search for Korean language.
Text : "한글은 뛰어난 언어입니다." Search term : "한글" Range search in Text-search : "한글" <= range < "한긁" (where "한긁" is generated simple increment of last character of search term, [like this|https://github.com/mongodb/mongo/pull/1151/commits/641c3041282746aff280b685424d55926bab93b2#diff-bc6db30f2a5f9618496534d03aeabf54R108])
Of course, this feature is not needed for language which has stemming.
So I want you add knob to enable or disable this range search for text-search (and default is false). Then we can use text search with this knob=true for Korean language.
I pushed pull-request for this simple idea to MongoDB github
This feature will save a lot of Korean guys. Please consider adding this feature seriously.
(I am not sure this feature is useful for Japanese or China which does not have space in phrase)
Thanks.
- related to
-
SERVER-45859 Text Indexes with partial word match or CJK match
- Closed
- links to