data-mining
Hierarchical clusterization heuristics
I want to explore relations between data items in large array. Every data item represented by multidimensional vector. First of all, I\'ve decided to use clusterization. I\'m interested in finding hie[详细]
2023-03-19 10:48 分类:问答What are some good ways of estimating 'approximate' semantic similarity between sentences?
I have been looking at the nlp tag on SO for the past couple of hours and am confident I did not miss anything but if I did, please do point me to the question.[详细]
2023-03-18 12:27 分类:问答The relationship between latent Dirichlet allocation and documents clustering
I would like to clarify the relationship between latent Dirichlet allocation (LDA) and the generic task of document clustering.[详细]
2023-03-18 12:05 分类:问答Python/Scrapy question: How to get cleaner results?
My task for a project is to data mine a website for specific names. My experience with python isn\'t high. When I scraped all the names, they come out in this format:[详细]
2023-03-18 06:54 分类:问答extracting postal addresses from pdf files
Are there any libraries/toolkits that would help me in the task of extracting postal address information from unstructured PDF documents (e.g. letters)? If not, how would开发者_开发问答 you approach t[详细]
2023-03-17 22:06 分类:问答Determining the called JSON file in Javascript to use in cURL for data mining - twitter like 'more' button
I\'m trying to extract a stream of historical messages of a site much like twitter. Basically we all know the \'MORE\' button it Twitter. This site has something similar and looks like it grabs a JSON[详细]
2023-03-17 08:34 分类:问答Expectation Maximization Issue - How to find the optimum number of gaussians within the data
Is there any algorithm or trick of how to determine the number of gaussians which should be identified within a set of data before applying the expectation maximization algorithm?[详细]
2023-03-16 12:32 分类:问答Discovering "templates" in a given text?
If I have significant amounts of text and am trying to discover templates that occur most frequently, I was thinking of solving it using the N-Gram approach and开发者_Python百科 in fact it was suggest[详细]
2023-03-16 02:32 分类:问答Best way to collect a descriptive set of tags about a company from its url?
I\'m pretty ignorant of what appears in the html/javascript of a website because I spend most of my time on the back-end (phrasing!). Basically, I want to know the best way to take a company\'s url, e[详细]
2023-03-15 15:48 分类:问答Machine learning library for .net analog of Apache Mahout [closed]
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.[详细]
2023-03-15 03:40 分类:问答
加载中,请稍侯......