开发者

crawl a website for data at frequent intervals

开发者 https://www.devze.com 2023-01-15 16:57 出处：网络

I need to crawl a website and retrieve certain data that keeps getti开发者_C百科ng updated every few minutes. How do i do this?Load WWW::Mechanize for crawling, use the mirror method inherited from LW

相关专题：perl

I need to crawl a website and retrieve certain data that keeps getti开发者_C百科ng updated every few minutes. How do i do this?

Load WWW::Mechanize for crawling, use the mirror method inherited from LWP::UserAgent.

Use sleep to control wait period, and use WWW::Mechanize for data retrieval:

use strict;
use WWW::Mechanize;

my $mech = WWW::Mechanize->new();
my $url = "http://www.nytimes.com";  # a sample webpage
while (1) {
    $mech->get($url);
    print $mech->content(format => 'text');  # read docs for WWW::Mechanize for advanced content processing
    sleep 300;  # wait for 5 minutes
}

EDIT: improved the sample content retrieval process.