i have the following file::
FirstName, FamilyName, Address, Pho开发者_如何学PythonneNo
the file is sorted according to the family name, how can i count the number of family names starts with a particular character ??
output should look like this ::
A: 2
B: 1
...
??
With awk:
awk '{print substr($2, 1, 1)}' file|
uniq -c|
awk '{print $2 ": " $1}'
OK, no awk. Here's with sed:
sed s'/[^,]*, \(.\).*/\1/' file|
uniq -c|
sed 's/.*\([0-9]\)\+ \([a-zA-Z]\)\+/\2: \1/'
OK, no sed. Here's with python:
import csv
r = csv.reader(open(file_name, 'r'))
d = {}
for i in r:
d[i[1][1]] = d.get(i[1][1], 0) + 1
for (k, v) in d.items():
print "%s: %s" % (k, v)
while read -r f l r; do echo "$l"; done < inputfile | cut -c 1 | sort | uniq -c
Just the Shell
#! /bin/bash
##### Count occurance of familyname initial
#FirstName, FamilyName, Address, PhoneNo
exec <<EOF
Isusara, Ali, Someplace, 022-222
Rat, Fink, Some Hole, 111-5555
Louis, Frayser, whaterver, 123-1144
Janet, Hayes, whoever St, 111-5555
Mary, Holt, Henrico VA, 222-9999
Phillis, Hughs, Some Town, 711-5525
Howard, Kingsley, ahahaha, 222-2222
EOF
while read first family rest
do
init=${family:0:1}
[ -n "$oinit" -a $init != "$oinit" ] && {
echo $oinit : $count
count=0
}
oinit=$init
let count++
done
echo $oinit : $count
Running
frayser@gentoo ~/doc/Answers/src/SH/names $ sh names.sh
A : 1
F : 2
H : 3
K : 1
frayser@gentoo ~/doc/Answers/src/SH/names $
To read from a file, remove the here document, and run:
chmod +x names.sh
./names.sh <file
The "hard way" — no use of awk or sed, exactly as asked for. If you're not sure what any of these commands mean, you should definitely look at the man page for each one.
INTERMED=`mktemp` # Creates a temporary file
COUNTS_L=`mktemp` # A second...
COUNTS_R=`mktemp` # A third...
cut -d , -f 2 | # Extracts the FamilyName field only
tr -d '\t ' | # Deletes spaces/tabs
cut -c 1 | # Keeps only the first character
# on each line
tr '[:lower:]' '[:upper:]' | # Capitalizes all letters
sort | # Sorts the list
uniq -c > $INTERMED # Counts how many of each letter
# there are
cut -c1-7 $INTERMED | # Cuts out the LHS of the temp file
tr -d ' ' > $COUNTS_R # Must delete the padding spaces though
cut -c9- $INTERMED > $COUNTS_L # Cut out the RHS of the temp file
# Combines the two halves into the final output in reverse order
paste -d ' ' /dev/null $COUNTS_R | paste -d ':' $COUNTS_L -
rm $INTERMED $COUNTS_L $COUNTS_R # Cleans up the temp files
awk one-liner:
awk '
{count[substr($2,1,1)]++}
END {for (init in count) print init ": " count[init]}
' filename
Prints the how many words start with each letter:
for i in {a..z}; do echo -n "$i:"; find path/to/folder -type f -exec sed "s/ /\n/g" {} \; | grep ^$i | wc -c | awk '{print $0}'; done
精彩评论