开发者

Library to search unstructured text on Android

开发者 https://www.devze.com 2023-03-29 02:51 出处:网络
I\'m writing an Android app where I need to quickly search through a large amount of text. The text is fixed; I would like to compute the indexes off-line and ship them with the app. Here are the requ

I'm writing an Android app where I need to quickly search through a large amount of text. The text is fixed; I would like to compute the indexes off-line and ship them with the app. Here are the requirements for the search library (numbers 1-5 are critical):

  1. Must support Unicode character set.
  2. Searches need to find arbitrary substrings within the text (not just terms or term prefixes).
  3. The search needs to return all matches.
  4. The library should be as lightweight as possible. In particular, it should be possible to strip out the indexing (and other) parts of the library and package the app with only the search API.
  5. The library license must permit it to be used in a proprietary combined work.
  6. There is no need for morphological analysis (stemming) or stop-word handling.
  7. Wildcard and/or regular expression search would be nice to have, but not required.
  8. Proximity search would also be nice.
  9. Likewise boolean search.

FTS3 (which comes with SQLite) is great with respect to requirement 4, but unfortunately won't satisfy requirement 2. (It can find term prefix开发者_如何学Goes but not suffixes—a search for "eat" can find "eats" but not "seat".)

I've looked at a number of libraries, including Lucene, Minion, and egothor. They all seem loaded down with great features that I don't need. I also am under the impression (although this may be wrong) that it would be hard to partition these libraries and just package up the search API. (I've also heard that it's hard to get Lucene to work on Android because it relies on java.rmi, which Android's Java does not include.)

Does anyone know of a library that does what I need (or could be adapted)? I'm not averse to porting the search API from another language into Java if the library otherwise meets the requirements.


Apache Lucy - a loose port of Lucene to C - might be worth a look.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号