I have a transaction log file in CSV format that I want use to run statistics. The log has the following fields:
date: Time/date stamp salesperson: The username of the person who closed the sale promo: sum total of items in the sale that were promotions. amount: grand total of the sale
I'd like to get the following statistics:
salesperson: The username of the salesperson being analyzed. minAmount: The smallest grand total of this salesperson's transaction. avgAmount: The mean grand total.. maxAmount: The largest grand total.. mi开发者_高级运维nPromo: The smallest promo amount by the salesperson. avgPromo: The mean promo amount...
I'm tempted to build a database structure, import this file, write SQL, and pull out the stats. I don't need anything more from this data than these stats. Is there an easier way? I'm hoping some bash script could make this easy.
TxtSushi does this:
tssql -table trans transactions.csv \
'select
salesperson,
min(as_real(amount)) as minAmount,
avg(as_real(amount)) as avgAmount,
max(as_real(amount)) as maxAmount,
min(as_real(promo)) as minPromo,
avg(as_real(promo)) as avgPromo
from trans
group by salesperson'
I have a bunch of example scripts showing how to use it.
Edit: fixed syntax
Could also bang out an awk script to do it. It's only CSV with a few variables.
You can loop through the lines in the CSV and use bash script variables to hold your min/max amounts. For the average, just keep a running total and then divide by the total number of lines (not counting a possible header).
Here are some useful snippets for working with CSV files in bash.
If your data might be quoted (e.g. because a field contains a comma), processing with bash, sed, etc. becomes much more complex.
加载中,请稍侯......
精彩评论