开发者

Nutch crawling with seeds urls are in range

开发者 https://www.devze.com 2023-01-03 01:41 出处:网络
Some site have url pattern as 开发者_JAVA百科www.___.com/id=1 to www.___.com/id=1000. How can I crawl the site using nutch. Is there any wway to provide seed for fetching in range??I think the easiest

Some site have url pattern as 开发者_JAVA百科www.___.com/id=1 to www.___.com/id=1000. How can I crawl the site using nutch. Is there any wway to provide seed for fetching in range??


I think the easiest way would be to have a script to generate your initial list of urls.


no. you have inject them manually or using a script

0

精彩评论

暂无评论...
验证码 换一张
取 消