开发者

Lucene Tokenizer with LookAhead

开发者 https://www.devze.com 2023-04-07 22:53 出处:网络
can anyone point me in the right direction for implementing a Lucene Tokenizer with LookAhead? I\'m using a snowball stemmer and I want to be able to get phrases of city names and prevent them from b

can anyone point me in the right direction for implementing a Lucene Tokenizer with LookAhead?

I'm using a snowball stemmer and I want to be able to get phrases of city names and prevent them from being stemmed, so that "Los Angeles" will be set as a single token, as o开发者_Go百科pposed to two tokens of "Los" and "Angeles".

I also need to keep tokens that don't match any city name as a single word.

any ideas?

TIA


Here is a gist of something I wrote which does what you want.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号