开发者

Simple Perl Regex parser

开发者 https://www.devze.com 2023-01-16 09:01 出处:网络
Hey, I am working on a very basic parser. I am almost certain that my regex is correct, but values do not seem to be being stored in my $1 and $2. Am I doing something wrong? I am just looking for tip

Hey, I am working on a very basic parser. I am almost certain that my regex is correct, but values do not seem to be being stored in my $1 and $2. Am I doing something wrong? I am just looking for tips to alter my code. Thanks for any advice! Also, I am new to Perl, so if I did something wrong, I am looking to get off on the right foot and develop solid habits.

Sample line from file:

Sat 02-August-2008 20:47 - 123.112.3.209 - "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;

I am just getting the hours from the times.

foreach my $lin开发者_如何学Goe (@lines)
{   
my $match =~ /\d\d-\w+-\d{4} (\d)(\d):\d\d/;

if( $1 == 0)
{
    $times[$2] = $times[$2] + 1;
}
else
{   
    my $time = $1.$2;
    $times[$time] = $times[$time]+ 1;
}
 }


print "\n";
for(my $i=0;$i<24;$i++)
{
print "$i: $times[$i]\n";
}


If you want to match on $line shouldn't the code read

$line =~ /\d\d-\w+-\d{4} (\d)(\d):\d\d/;

See here.


Can you give some example of what kind of pattern you are try to match? Otherwise I won't be able to tell if your regex matches your pattern or not. However there are some improvements you can make about your code:

First off, always test if a match is successful if you want to use $1, $2 etc

if($match =~ /\d\d-\w+-\d{4} (\d)(\d):\d\d/) {

    if( $1 == 0)
    {
        $times[$2] = $times[$2] + 1;
    }
    else
    {   
        my $time = $1.$2;
        $times[$time] = $times[$time]+ 1;
    }
} else {
    warn "no match!\n";
}

Second, always use the '-w' switch. In this case, you will probably get the warning message about $1 and $2 are not initialized due to failed match:

#!/usr/bin/perl -w


First, if you are new to Perl, one of the strengths is CPAN and the many solutions there. Don't reinvent the wheel!

There is a great module called Date::Parse that will parse the time part for you. Then the only regex problem that you have is separating out the time part of your line.

Based on your one line sample, this code will do that:

use strict;
use warnings;

use Date::Parse;

my $line="Sat 02-August-2008 20:47 - 123.112.3.209 - \"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;";
my $tmpart;

if ($line=~ /^(.*\d+:\d+) -/) {
    $tmpart=$1;

    print "Time part = $tmpart\n";

    my $time=str2time($tmpart);
    my ($ss,$mm,$hh,$day,$month,$year,$zone) = strptime($tmpart);

    $year+=1900;
    $month+=1;

    print "Unix time: $time\n";
    print "Parsed time: $month/$day/$year $hh:$mm:$ss  \n\n";
} 
else {
   warn "no match!\n";
}   

This will return a Unix time number that is then easy to work with. Or (as shown) you can parse the individual components of the time.

0

精彩评论

暂无评论...
验证码 换一张
取 消