Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-29598

Support Korean language in full text search

    • Type: Icon: New Feature New Feature
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Text Search
    • Query Integration

      Add Korean to languages supported in MongoDB FTS.

      Original description:
      First of all, MongoDB support stemming for major language like english.
      But there's no stemming for CJK (Especially I am focusing on Korean). So MongoDB text search is useless for korean language unless stemming Korean in application code.

      I am not sure you are interested in Korean,
      Anyway Korean use only suffix(postpositional word) after stem(base word) like ..

      Stem : 한글
      With suffix : 한글은, 한글이, 한글을, 한글과, 한글도, 한글처럼, ...
      

      But current MongoDB implementation, MongoDB search exact match with search term. So Korean word does not matched because of suffix("은", "는", "이", "가", "처럼", ...)

      So if MongoDB support range search for text search like below example, We (Korean) can use text-search for Korean language.

      Text : "한글은 뛰어난 언어입니다."
      Search term : "한글"
      Range search in Text-search : "한글" <= range < "한긁" 
        (where "한긁" is generated simple increment of last character of search term, [like this|https://github.com/mongodb/mongo/pull/1151/commits/641c3041282746aff280b685424d55926bab93b2#diff-bc6db30f2a5f9618496534d03aeabf54R108])
      

      Of course, this feature is not needed for language which has stemming.
      So I want you add knob to enable or disable this range search for text-search (and default is false). Then we can use text search with this knob=true for Korean language.

      I pushed pull-request for this simple idea to MongoDB github

      This feature will save a lot of Korean guys. Please consider adding this feature seriously.
      (I am not sure this feature is useful for Japanese or China which does not have space in phrase)

      Thanks.

            Assignee:
            backlog-query-integration [DO NOT USE] Backlog - Query Integration
            Reporter:
            sunguck.lee@gmail.com 아나 하리
            Votes:
            6 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated: