text-mining
R text mining package: Allowing to incorporate new documents into an existing corpus
I was wondering if there is any chance of R\'s text mining package having the following feature: myCorpus <- Corpus(DirSource(<directory-contatining-textfiles>),control=...)[详细]
2023-03-19 04:19 分类:问答Pure statistical, or Natural Language Processing engine?
What are the statistical engines that yield bette开发者_JAVA技巧r results than the OpenNLP suite of tools, if any? What I\'m looking for is an engine that picks keywords from texts and provides stemmi[详细]
2023-03-18 20:40 分类:问答The relationship between latent Dirichlet allocation and documents clustering
I would like to clarify the relationship between latent Dirichlet allocation (LDA) and the generic task of document clustering.[详细]
2023-03-18 12:05 分类:问答extracting postal addresses from pdf files
Are there any libraries/toolkits that would help me in the task of extracting postal address information from unstructured PDF documents (e.g. letters)? If not, how would开发者_开发问答 you approach t[详细]
2023-03-17 22:06 分类:问答How to access Wikipedia from R?
Is there any package for R that allows queryi开发者_运维技巧ng Wikipedia (most probably using Mediawiki API) to get list of available articles relevant to such query, as well as import selected articl[详细]
2023-03-07 14:23 分类:问答What techniques are there to extract a navigational menu from a web page?
I\'m looking for a method to extract a menu used for navigation from a web page heavy with links (and probably text). The pages I\'m interested in are quite plain, valid XHTML, and it\'s a safe assump[详细]
2023-03-01 15:22 分类:问答Mallet: features contribution on each prediction
I\'m developing a NER system on Mallet using CRFs. Do you know if it is possible to collect the features contribution for each prediction?[详细]
2023-03-01 10:58 分类:问答Log likelihood to implement Naive Bayes for Text Classification
I am implementing Naive Bayes algorithm for text classification. I have ~1000 documents for training and 400 documents for testing. I think I\'ve implemented training part correctly, but I am confused[详细]
2023-02-20 15:49 分类:问答What is a good approach for extracting keywords from user-submitted text?
I\'m building a site that allows users to make sense of a debate by graphically representing arguments for and against a particular issue. (Wrangl)[详细]
2023-02-17 06:55 分类:问答Book and article references sought for starting out with document classification
I am interested in doing a project on document classification and have been looking for books that could be useful for the theoretical parts in text mining re开发者_JAVA技巧lated to this or examples o[详细]
2023-02-15 22:37 分类:问答