开发者

crawl a website for data at frequent intervals

开发者 https://www.devze.com 2023-01-15 16:57 出处:网络
I need to crawl a website and retrieve certain data that keeps getti开发者_C百科ng updated every few minutes. How do i do this?Load WWW::Mechanize for crawling, use the mirror method inherited from LW

I need to crawl a website and retrieve certain data that keeps getti开发者_C百科ng updated every few minutes. How do i do this?


Load WWW::Mechanize for crawling, use the mirror method inherited from LWP::UserAgent.


Use sleep to control wait period, and use WWW::Mechanize for data retrieval:

use strict;
use WWW::Mechanize;

my $mech = WWW::Mechanize->new();
my $url = "http://www.nytimes.com";  # a sample webpage
while (1) {
    $mech->get($url);
    print $mech->content(format => 'text');  # read docs for WWW::Mechanize for advanced content processing
    sleep 300;  # wait for 5 minutes
}

EDIT: improved the sample content retrieval process.

0

精彩评论

暂无评论...
验证码 换一张
取 消