开发者

Unix grep query

开发者 https://www.devze.com 2023-04-11 06:08 出处:网络
[2011-09-23 18:46:51:697 GMT+00:00][17B020C421B4BCC2CEBAD9C1B77CA413.http-8080-6][com.abc.actions.RegisterAction] INFO loggedOut #mouseclicked# userid=1
[2011-09-23 18:46:51:697 GMT+00:00][17B020C421B4BCC2CEBAD9C1B77CA413.http-8080-6][com.abc.actions.RegisterAction] INFO loggedOut #mouseclicked# userid=1
[2011-09-24 19:46:53:697 GMT+00:00][47B020C421B4BCC2CEBAD9C1B77CA413.http-8080-6][com.abc.actions.RegisterAction] INFO loggedOut #mouseclicked# userid=12
[2011-09-25 20:46:51:697 GMT+00:00][57B020C421B4BCC2CEBAD9C1B77CA413.http-8080-6][com.abc.actions.RegisterAction] INFO loggedin #mouseclicked# userid=23
[2011-09-25 20:46:51:697 GMT+00:00][57B020C421B4BCC2CEBAD9C1B77CA413.http-8080-6][com.abc.actions.RegisterAction] DEBUG mouseclicked by userid=566
[2011-09-25 20:56:56:697 GMT+00:00][77B020C421B4BCC2CEBAD9C1B77CA413.http-8080-6][com.abc.actions.RegisterAction] INFO loggedin #mouseclicked# userid=44
[2011-09-26 22:48:55:697 GMT+00:00][87B020C421B4BCC2CEBAD9C1B77CA413.http-8080-6][com.abc.actions.RegisterAction] INFO loggedOut #mouseclicked# userid=55

In above file, I want to know how many times #mouseclicked# occured for the date ran开发者_如何学Pythonging from 24-Sep-11 to 25-Sep-11 (both dates inclusive).

In above case, the command should return me 3 (Note: mouseclicked is not considered as it is not matching with #mouseclicked#)

How can I use grep command in this case?


grep alone won't solve the general problem. It can't recognize lines that are within a certain range of dates. (Well, it probably can if you use a sufficiently complex regular expression, but the regexp will be quite different for each range of dates you're interested in.)

But for your specific question, this will work:

egrep -c '^\[2011-09-(24|25).*#mouseclicked#' filename

egrep supports a more powerful form of regular expressions, including the | operator. The -c option tells it to print the number of matching lines rather than printing the lines themselves.

But as you can imagine, if you want lines from 1pm on September 30 to 11am on October 2, the regular expression is going to be a lot more complex, and it will take some significant effort to construct it.

If I were going to be doing this a lot, I'd write a separate tool that extracts lines from a specified range of dates (or dates and times), taking advantage of the particular date format used in this file (YYYY-MM-DD HH:MM:SS, ISO-8601, is an excellent choice). Personally, I'd write such a tool in Perl. I could then run the tool on the file and pipe the output through grep.

EDIT:

In response to the comment, grep doesn't understand date ranges, just character sequences. You can write a complex regular expression that would match everything in the range 1-oct-2010 to 1-dec-2011. Here's my attempt (not tested):

egrep -c '^\[(2010-1.*|2011-(0.|10|11)|2011-12-01).*#mouseclicked#' filename

This deals with several individual subranges: October through December of 2010, January through September, then October, then November of 2011, and finally December 1 of 2011.

And, as I said above, for any other range of dates (or, worse, dates and times), you'll need to construct an entirely new complicated regular expression that matches subranges of the desired time span, based on their textual representation, not on their meanings as dates.

That's why I wouldn't consider this kind of approach if I wanted to do this more than once or twice.

Do you know a scripting language like Perl or Python? If so, it wouldn't be too difficult to write a script that will actually parse the timestamps and select lines that are within the desired range.

In fact, I wouldn't be at all surprised if such a tool already exists (I just don't know where to find it).

EDIT 2:

Here's a Perl script I threw together:

#!/usr/bin/perl

use strict;
use warnings;

die "Usage: $0 start end [file...]\n" if scalar @ARGV < 2;
my $start = shift;
my $end = shift;
$start =~ s/\D//g;
$end   =~ s/\D//g;
$end .= '99999999999999999999999999999';

print "start=\"$start\", end=\"$end\"\n";

while (<>) {
    if (/^\[([^]]+)\]/) {
        my $timestamp = $1;
        $timestamp =~ s/\D//g;
        if ($timestamp ge $start and $timestamp le $end) {
            print;
        }
    }
}

It treats the specified start and end times, as well as the timestamps in the file, as digit sequences and does a stringwise (not numeric) comparison on them. It ignores the timezone information. It could be made a lot more sophisticated with one of the time and date modules from CPAN.

For your original question, you'd run:

this-perl-script 2011-09-24 2011-09-25 input-file | grep -c '#mouseclicked#'


cat filename | grep '^\[2011-09-2[45]' | grep mouseclicked | wc -l 

Or, more simply:

grep '^\[2011-09-2[45]' filename | grep -c mouseclicked


I would try something like grep | wc-l

Grep will filter the likes that contain your string while wc -l will count the number of lines that are outputed by grep.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号