web-crawler
Mechanze form submission causes 'Assertion Error' in response when .read() is attempted
I am writing a web-crawl program with python and am unable to login using mechanize.The form on the site looks like:[详细]
2023-03-17 08:54 分类:问答Nutch No agents listed in 'http.agent.name'
Exception in thread \"main\" java.lang.IllegalArgumentException: Fetcher: No agents listed in \'http.agent.name\' property.[详细]
2023-03-17 04:10 分类:问答SharePoint search not indexing contents of document libraries [closed]
Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow.[详细]
2023-03-16 23:38 分类:问答Adding html break to each line in Ruby
What I basically have here is a short ruby script (I just started learning) and the purpose of it is to spider-crawl a website and return all the links it finds.[详细]
2023-03-16 12:11 分类:问答my site crawler died while it's running
I wrote a site crawler to get links and images to create site map but it killed while running! so it\'s not my whole class[详细]
2023-03-16 02:53 分类:问答hierarchy in sites
I\'m not sure if this question will have a single answer or even a concise one for all answer but I thought I would ask non the less. The problem isn\'t language specific either but may have some sort[详细]
2023-03-16 01:49 分类:问答ASP.Net authentication and Googlebot
I have an ASP.Net 3.5 web site with forms authentication en开发者_如何学Pythonabled.Is it possible to have Googlebot crawl my web site without getting prompted for a username/password?Google claims th[详细]
2023-03-15 18:30 分类:问答In Java, there is a collection which I can take an element only after a time?
I\'m doing an webcrawler, and I want to not overload the servers with requests, so I will limit the access to the servers by time.[详细]
2023-03-15 14:37 分类:问答Programmable WebCrawler with C#
I would like to extract 开发者_JS百科specific data form a known Url : from html tags like span, a, divs ... ![详细]
2023-03-15 13:33 分类:问答MVC site is not crawlable by main stream search engines?
It\'s based on MVC 3 + Razor, and now there is no DNS created for the site, but just public IP. Due to lack of understanding of whether and how google handle the spider for IP sites, we\'re getting a[详细]
2023-03-15 09:34 分类:问答