Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-45859

Text Indexes with partial word match or CJK match

    • Type: Icon: Improvement Improvement
    • Resolution: Done
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: 4.0.6
    • Component/s: Text Search
    • None

      I need to develop partial word search util with CJK(almost Korean letters).

      Mongo Text Search doesn't surpport CJK letters, so I make trick.

       

      I seperate original data field and search data field, such as 'title' and 'rawTitle'

      When I insert or update title, convert title String value in Spring(JAVA) and set value in rawTitle.

       

      Like this.

      // convert
      String rawTitle = URLEncoder.encode(title, "UTF-8");
      
      

      But, alphabet or numbers are same rawTitle and title.

      // convert alphabet and numbers
      String rawTitle = "$" + Integer.toHexString(alphabetOrNumberCharacter);

       

      If I want to search this,

      // search
      String searchText = "\"" + convertedText + "\"";

      Also need textIndexes

      // textIndexes
      db.collection.createIndex({
          rawTitle: 'text'
      }, {
          default_language: 'none'
      });

      This trick is not good for cli, because if i want to search something, I must convert text and paste it.

      But, it's work for me.... even using CJK letters.

      If I want to find '가나다', so I write '나다'.

      search util will convert '나다' to '%EB%82%98%EB%8B%A4'

      and i got result, '가나다'. 

       

       

      When i have 3 document,

      // collection
      {
          title: 'qweqweqwe'
      }
      {
          title: 'qweewqqwe'
      }
      {
          title: 'qw eq we'
      }

       

      and I search text, 'eq' in search util, I can get 'qweqweqwe' and 'qw eq we'

       

      I think it's useful trick, but i worry about performance issue.

      I want to hear your opinion. Thanks.

       

       

            Assignee:
            carl.champain@mongodb.com Carl Champain (Inactive)
            Reporter:
            signal.be@gmail.com Karen Takahashi
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: