kevinma Posted January 31, 2013 Report Share Posted January 31, 2013 I tried to use the NOT operator with keywords to find out all the emails contain the word "subscribe" but not "unsubscribe". Keywords: subscribe NOT unsubscribe However, this technique does not work in some of the Simplified Chinese characters. For instance, using the following keywords. Simplified Chinese Keywords: 安 NOT 安排 Simplified Chinese sentence: 是否可以安排在星期一前完成 The results still highlight the word 安, however the email content contains the whole sentence ...安排在 .... The problem maybe those characters are the same in simplified Chinese and traditional Chinese. Please try it in Google Translate at http://translate.goo...%AE%89%E6%8E%92 Traditional Chinese Keywords: 電子 NOT 電子郵件 Link to comment Share on other sites More sharing options...
Chris Posted January 31, 2013 Report Share Posted January 31, 2013 Hello, I tried a sample document containing the text "是否可以安排在星期一前完成" and it is (correctly) not returned when searching for "安 NOT 安排". What Intella version are you using? Can you send us a complete sample document? The issue is most likely in the way Chinese, Japanese and Korean documents are indexed. As these languages do not require whitespace or other characters to separate words, searching on words becomes hard. This is "solved" by breaking up the text in so-called bi-grams, basically all pairs of two characters that occur in the document, and processing them as if these are words. If you look at the Previewer's Words tab, you will see what "words" are extracted from this text. This method does not give perfect results, but often produce a reasonable result. Link to comment Share on other sites More sharing options...
Recommended Posts