n-gram
Save Google NGram result as .csv
Is there an easy way, to save a Google Ngram result http://books.google.com/ngrams/ as a csv? So that I get a list like[详细]
2023-04-11 19:28 分类:问答NLP algorithm to 'fill out' search terms
I\'m trying to write an algorithm (which I\'m assuming will rely on natural language processing techniques) to \'fill out\' a list of search terms. There is probably a name for this kind of thing whic[详细]
2023-04-09 05:15 分类:问答Fast n-gram calculation
I\'m using NLTK to search for n-grams in a corpus but it\'s taking a very long time in some cases. I\'ve noticed calculating n-grams isn\'t an uncommon feature in other packages (apparently Haystack h[详细]
2023-04-08 22:48 分类:问答The more I use a Java HashMap, the more the performance drops - even with stable size
I want to scan through a huge corpus of text and count word frequencies (n-gram frequencies actually for those who are familiar with NLP/IR). I use a Java HashMapfor this. So what happens is I process[详细]
2023-04-06 03:15 分类:问答Automatically linking categories to each other when categorizing text
I\'ve been working on a project to data-mine a large amount of short texts and categorize these based on a pre-existing large list of category names. To do this I had to figure out how to first create[详细]
2023-03-29 12:25 分类:问答Extract keyphrases from text (1-4 word ngrams)
What\'s the best way to extract keyphrases from a block of text? I\'m writing a tool to do keyword extraction: something like this. I\'ve found a few libraries for Python and Perl to extract n-grams,[详细]
2023-03-28 18:58 分类:问答N-Gram, tf-idf and Cosine similarity in Perl
I am trying to do some pattern \'mining\' in piece of multi word on each line. I have done the N-gram analysis using the Text::Ngrams module in perl which give me the frequency of each word . I am how[详细]
2023-03-15 21:41 分类:问答Solr NGramTokenizerFactory and PatternReplaceCharFilterFactory - Analyzer results inconsistent with Query Results
I am currently using what I (mistakenly) thought would be a fairly straightforward implementation of Solr\'s NGramTokenizerFactory, but I\'m getting strange results that are inconsistent between the a[详细]
2023-03-15 12:53 分类:问答Storing tri-grams in database or generate on-the-fly?
I\'m trying to create an application which uses trigrams for approximate string matching. Now all the records are in the database and i want to be able to search the records on a fixed column. Is it b[详细]
2023-03-03 15:25 分类:问答Sphinx 4 corrupted ARPA LM?
I have an ARPA LM generated by kylm, when running SPHINX I get this exception stack trace: Exception in thread \"main\" java.lang.RuntimeException: Allocation of search manager resources failed[详细]
2023-02-13 17:35 分类:问答