I need to do some parse some data off webpages. How do I extract text between tags using HTML::Parser?
Consider the following sample code:
#!/usr/bin/perl
use strict;
use warnings;
use HTML::Parser;
use Data::Dumper;
my $find_title = HTML::Parser->new(
    api_version => 开发者_C百科3,
    start_h => [ 
        sub {
             my ($tag, $attr) = @_;
             print Dumper \@_;
            }, 
        'tag'
               ],
  );
my $html = join '',
    "<html><head><title>Extract me!</title></head><body>",
    (map { qq(<a href="http://$_.com">$_</a>) } qw/foo bar baz/),
    "</body></html>";
$find_title->report_tags('title');
$find_title->parse($html);
How do I fix this so I can extract the title? This only extracts the tag.
You need a text_h handler to collect the text, and an end_h handler to do something when the </title> tag appears (at which point the text inside the tag has been collected).
HTML::Parser is a fairly low-level module, you may be happier with one of the many modules built on top of it, like HTML::TreeBuilder or HTML::TokeParser.
For example, HTML::HeadParser makes extracting the title trivial:
use strict;
use warnings;
use HTML::HeadParser;
my $html = join '',
    "<html><head><title>Extract me!</title></head><body>",
    (map { qq(<a href="http://$_.com">$_</a>) } qw/foo bar baz/),
    "</body></html>";
my $p = HTML::HeadParser->new;
$p->parse($html);
my $title = $p->header('Title');
 
         
                                         
                                         
                                         
                                        ![Interactive visualization of a graph in python [closed]](https://www.devze.com/res/2023/04-10/09/92d32fe8c0d22fb96bd6f6e8b7d1f457.gif) 
                                         
                                         
                                         
                                         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论