data-mining

相关标签：javascript jquery android 多少钱 iPhone

Hierarchical clusterization heuristics

I want to explore relations between data items in large array. Every data item represented by multidimensional vector. First of all, I\'ve decided to use clusterization. I\'m interested in finding hie[详细]

2023-03-19 10:48 分类：问答
What are some good ways of estimating 'approximate' semantic similarity between sentences?

I have been looking at the nlp tag on SO for the past couple of hours and am confident I did not miss anything but if I did, please do point me to the question.[详细]

2023-03-18 12:27 分类：问答
The relationship between latent Dirichlet allocation and documents clustering

I would like to clarify the relationship between latent Dirichlet allocation (LDA) and the generic task of document clustering.[详细]

2023-03-18 12:05 分类：问答
Python/Scrapy question: How to get cleaner results?

My task for a project is to data mine a website for specific names. My experience with python isn\'t high. When I scraped all the names, they come out in this format:[详细]

2023-03-18 06:54 分类：问答
extracting postal addresses from pdf files

Are there any libraries/toolkits that would help me in the task of extracting postal address information from unstructured PDF documents (e.g. letters)? If not, how would开发者_开发问答 you approach t[详细]

2023-03-17 22:06 分类：问答
Determining the called JSON file in Javascript to use in cURL for data mining - twitter like 'more' button

I\'m trying to extract a stream of historical messages of a site much like twitter. Basically we all know the \'MORE\' button it Twitter. This site has something similar and looks like it grabs a JSON[详细]

2023-03-17 08:34 分类：问答
Expectation Maximization Issue - How to find the optimum number of gaussians within the data

Is there any algorithm or trick of how to determine the number of gaussians which should be identified within a set of data before applying the expectation maximization algorithm?[详细]

2023-03-16 12:32 分类：问答
Discovering "templates" in a given text?

If I have significant amounts of text and am trying to discover templates that occur most frequently, I was thinking of solving it using the N-Gram approach and开发者_Python百科 in fact it was suggest[详细]

2023-03-16 02:32 分类：问答
Best way to collect a descriptive set of tags about a company from its url?

I\'m pretty ignorant of what appears in the html/javascript of a website because I spend most of my time on the back-end (phrasing!). Basically, I want to know the best way to take a company\'s url, e[详细]

2023-03-15 15:48 分类：问答
Machine learning library for .net analog of Apache Mahout [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.[详细]

2023-03-15 03:40 分类：问答