I need to crawl a website and retrieve certain data that keeps getti开发者_C百科ng updated every few minutes. How do i do this?
Load WWW::Mechanize for crawling, use the mirror method inherited from LWP::UserAgent.
Use sleep to control wait period, and use WWW::Mechanize for data retrieval:
use strict;
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();
my $url = "http://www.nytimes.com"; # a sample webpage
while (1) {
$mech->get($url);
print $mech->content(format => 'text'); # read docs for WWW::Mechanize for advanced content processing
sleep 300; # wait for 5 minutes
}
EDIT: improved the sample content retrieval process.
加载中,请稍侯......
精彩评论