Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Text Search
Labels:
- qi-text-search

Assigned Teams:

Query Integration
Case:
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Add Korean to languages supported in MongoDB FTS.

Original description:
First of all, MongoDB support stemming for major language like english.
But there's no stemming for CJK (Especially I am focusing on Korean). So MongoDB text search is useless for korean language unless stemming Korean in application code.

I am not sure you are interested in Korean,
Anyway Korean use only suffix(postpositional word) after stem(base word) like ..

Stem : 한글
With suffix : 한글은, 한글이, 한글을, 한글과, 한글도, 한글처럼, ...

But current MongoDB implementation, MongoDB search exact match with search term. So Korean word does not matched because of suffix("은", "는", "이", "가", "처럼", ...)

So if MongoDB support range search for text search like below example, We (Korean) can use text-search for Korean language.

Text : "한글은 뛰어난 언어입니다."
Search term : "한글"
Range search in Text-search : "한글" <= range < "한긁" 
  (where "한긁" is generated simple increment of last character of search term, [like this|https://github.com/mongodb/mongo/pull/1151/commits/641c3041282746aff280b685424d55926bab93b2#diff-bc6db30f2a5f9618496534d03aeabf54R108])

Of course, this feature is not needed for language which has stemming.
So I want you add knob to enable or disable this range search for text-search (and default is false). Then we can use text search with this knob=true for Korean language.

I pushed pull-request for this simple idea to MongoDB github

This feature will save a lot of Korean guys. Please consider adding this feature seriously.
(I am not sure this feature is useful for Japanese or China which does not have space in phrase)

Thanks.

related to

SERVER-45859 Text Indexes with partial word match or CJK match

Closed

links to

Pull Request

Assignee:: [DO NOT USE] Backlog - Query Integration
Reporter:: 아나 하리
Participants:: [DO NOT USE] Backlog - Query Integration, Asya Kamsky, Mark Agarunov, 아나 하리
Votes:: 6 Vote for this issue
Watchers:: 13 Start watching this issue

Created:: Jun 13 2017 08:44:26 AM UTC
Updated:: Dec 27 2023 04:46:08 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates