-
Type: Improvement
-
Resolution: Done
-
Priority: Minor - P4
-
None
-
Affects Version/s: 4.0.6
-
Component/s: Text Search
-
None
I need to develop partial word search util with CJK(almost Korean letters).
Mongo Text Search doesn't surpport CJK letters, so I make trick.
I seperate original data field and search data field, such as 'title' and 'rawTitle'
When I insert or update title, convert title String value in Spring(JAVA) and set value in rawTitle.
Like this.
// convert String rawTitle = URLEncoder.encode(title, "UTF-8");
But, alphabet or numbers are same rawTitle and title.
// convert alphabet and numbers String rawTitle = "$" + Integer.toHexString(alphabetOrNumberCharacter);
If I want to search this,
// search String searchText = "\"" + convertedText + "\"";
Also need textIndexes
// textIndexes db.collection.createIndex({ rawTitle: 'text' }, { default_language: 'none' });
This trick is not good for cli, because if i want to search something, I must convert text and paste it.
But, it's work for me.... even using CJK letters.
If I want to find '가나다', so I write '나다'.
search util will convert '나다' to '%EB%82%98%EB%8B%A4'
and i got result, '가나다'.
When i have 3 document,
// collection { title: 'qweqweqwe' } { title: 'qweewqqwe' } { title: 'qw eq we' }
and I search text, 'eq' in search util, I can get 'qweqweqwe' and 'qw eq we'
I think it's useful trick, but i worry about performance issue.
I want to hear your opinion. Thanks.
- is related to
-
SERVER-29598 Support Korean language in full text search
- Backlog