开发者

Is SQLite on Android built with the ICU tokenizer enabled for FTS?

开发者 https://www.devze.com 2023-03-28 19:39 出处:网络
开发者_如何学编程Like the title says: can we use ...USING fts3(tokenizer icu th_TH, ...). If we can, does anyone know what locales are suported, and whether it varies by platform version?No, only toke

开发者_如何学编程Like the title says: can we use ...USING fts3(tokenizer icu th_TH, ...). If we can, does anyone know what locales are suported, and whether it varies by platform version?


No, only tokenizer=porter

When I specify tokenizer=icu, I get "android.database.sqlite.SQLiteException: unknown tokenizer: icu"

Also, this link hints that if Android didn't compile it in by default, it will not be available http://sqlite.phxsoftware.com/forums/t/2349.aspx


For API Level 21 or up, I tested and found that ICU tokenizer is already available.

However to support 90%+ devices, some work-around can be made. I have a work-around idea, which is also mentioned in my another question: Work around of Android SQLite full-text search for Asian text

You may port the ICU tokenizer function into java, or a native Android module, as a separate module but not directly involved in SQLite. Then use the "external content table" to link to the virtual table (supported from FTS4).

When adding tuple, add normal content to external content table, but invoke the stand alone tokenzier to add artificial spaces to boundary of words before adding into the virtual index table.

When doing tuple delete, invoke the tokenzier again to update the content table with artificial spaces, then delete the virtual table tuple, then delete the content table tuple.

This is a little tricky, but comparing another option of re-compile a full SQLite, it is already much less effort.

For the external content table and how it works, please refer https://www.sqlite.org/fts3.html#section_6_2_2

The available ICU tokenizer is actually there in Android SDK. Use BreakIterator.getWordInstance. Looks like it even supports dictionary based tokenizer for languages such as Chinese. http://developer.android.com/reference/java/text/BreakIterator.html


I have some Android code that uses tokenization in the link below, maybe it will of some help:

https://github.com/gast-lib/gast-lib/blob/master/app/src/root/gast/playground/speech/food/db/FtsIndexedFoodDatabase.java

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号