Loading...

XML

Word

Printable

JSON

I need to develop partial word search util with CJK(almost Korean letters).

Mongo Text Search doesn't surpport CJK letters, so I make trick.

I seperate original data field and search data field, such as 'title' and 'rawTitle'

When I insert or update title, convert title String value in Spring(JAVA) and set value in rawTitle.

Like this.

// convert
String rawTitle = URLEncoder.encode(title, "UTF-8");

But, alphabet or numbers are same rawTitle and title.

// convert alphabet and numbers
String rawTitle = "$" + Integer.toHexString(alphabetOrNumberCharacter);

If I want to search this,

// search
String searchText = "\"" + convertedText + "\"";

Also need textIndexes

// textIndexes
db.collection.createIndex({
    rawTitle: 'text'
}, {
    default_language: 'none'
});

This trick is not good for cli, because if i want to search something, I must convert text and paste it.

But, it's work for me.... even using CJK letters.

If I want to find '가나다', so I write '나다'.

search util will convert '나다' to '%EB%82%98%EB%8B%A4'

and i got result, '가나다'.

When i have 3 document,

// collection
{
    title: 'qweqweqwe'
}
{
    title: 'qweewqqwe'
}
{
    title: 'qw eq we'
}

and I search text, 'eq' in search util, I can get 'qweqweqwe' and 'qw eq we'

I think it's useful trick, but i worry about performance issue.

I want to hear your opinion. Thanks.

is related to

SERVER-29598 Support Korean language in full text search