Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this quest开发者_开发问答ionAs phrased in the question, I'm looking for a free and/or open-source text-segmentation algorithm for Chinese, I do understand it is a very difficult task to solve, as there are many ambiguities involed. I know there's google's API, but well it is rather a black-box, i.e. not many information of what it is doing are passing through.
The keyword text-segmentation for Chinese should be 中文分词 in Chinese.
Good and active open-source text-segmentation algorithm :
- 盘古分词(Pan Gu Segment) :
C#,Snapshot - ik-analyzer :
Java - ICTCLAS :
C/C++, Java, C#,Demo - NlpBamboo :
C, PHP, PostgreSQL - HTTPCWS : based on
ICTCLAS,Demo - mmseg4j :
Java - fudannlp :
Java,Demo - smallseg :
Python, Java,Demo - nseg : NodeJS
- mini-segmenter:
python
Other
- Google Code : http://code.google.com/query/#q=中文分词
- OSChina (Open Source China)
Sample
Google Chrome (Chromium) :
src,cc_cedict.txt (73,145 Chinese words/pharases)In
text fieldortextareaof Google Chrome with Chinese sentences, press Ctrl+← or Ctrl+→Double clickon中文分词指的是将一个汉字序列切分成一个一个单独的词
Stanford segment using CRF algorithmn.
It's under GPL
link page is : http://nlp.stanford.edu/software/segmenter.shtml
ICU has details on universal text segmentation - http://userguide.icu-project.org/boundaryanalysis
Cursory Googling for "text segmentation chinese open source" reveals this library, which may or may not be what you're looking for...:
http://sourceforge.net/projects/ktdictseg/
The results hint at a few alternative venues to look for an open-source library, too:
- Searching for an open-source search implementation that might work with Chinese.
- Searching for an open-source plagiarism detection implementation that might with Chinese.
加载中,请稍侯......
精彩评论