I am trying to sort some data in bash. Data looks like below.
2011072开发者_如何转开发4.gz 1347
20110724.gz 2128
20110725.gz 1315
20110725.gz 2334
20110726.gz 808
20110726.gz 1088
-bash-3.2$
After sorting, it should look like
20110724.gz 3475
20110725.gz 3649
20110726.gz 1896
Basically, for a given date, the data are summed up. Can somebody help? Thanks.
hmm, hopefully I figure it out in a few days.
Here's a quick and dirty perl oneliner:
$ perl -e 'my %h = (); while (<>) { chomp; my ($fname, $count) = split; $h{$fname} += $count;} foreach my $k (sort keys %h) {print $k, " ", $h{$k}, "\n"}' < datafile
Here's a perl solution.
Usage: script.pl input.txt > output.txt
Code:
use warnings;
use strict;
use ARGV::readonly;
my %sums;
while (<>) {
my ($date, $num) = split;
$sums{$date} += $num;
}
for my $date (sort keys %sums) {
print "$date $sums{$date}\n";
}
Or as a one-liner:
$ perl -we 'my %h; while(<>) { ($d,$n)=split; $h{$d}+=$n; } print "$_ $h{$_}\n" for sort keys %h;' data2.txt
In case you do need a numerical sort on the dates:
sort { substr($a,0,8) <=> substr($b,0,8) } keys %sums;
You don't need perl for doing that. Some shell trickery will help :)
sort -n -k1,8 <file | while true ; do
if ! read line ; then
test -n "$accfile" && echo $accfile $value
break
fi
line=$(echo $line | tr -s ' ' )
curfile=$(echo $line | cut -d\ -f1)
curvalue=$(echo $line | cut -d\ -f2)
if [ $curfile != "$accfile" ] ; then
# new file, output the last if not empty
test -n "$accfile" && echo $accfile $value
accfile=$curfile
value=$curvalue
else
value=$(expr $value \+ $curvalue)
fi
done
The k
parameter tells sort what characters use to sort. As dates are put in number-ordered format, a number sort (-n
) works.
精彩评论