I'm trying to make a shell script that will combine two csv files in the follo开发者_高级运维wing way:
I have two csv files, f1.csv and f2.csv. The format of f1.csv is:
startId, endId, roomNum
f2.csv has a format like this:
startId, endId, teacherId
I want to combine these two into one csv file with this format:
startId, endId, roomNum, teacherId.
What is the best way to accomplish this with a shell script that runs under Linux?
Try:
join -t, -1 1 -2 1 -o 1.2 1.3 1.4 2.4 <(awk -F, '{print $1":"$2","$0}' f1.csv | sort) <(awk -F, '{print $1":"$2","$0}' f2.csv | sort)
How it works:
1) I first create a composite key column, by joining the startId and endId into startId:endId for both files.
awk -F, '{print $1":"$2","$0}' f1.csv
awk -F, '{print $1":"$2","$0}' f2.csv
2) I sort both outputs:
awk -F, '{print $1":"$2","$0}' f1.csv | sort
awk -F, '{print $1":"$2","$0}' f2.csv | sort
3) I then use the join
command to join on my composite key (in the first column) and output just the columns I need.
Use join -t ';'
to combine the corresponding lines. The parameter to the -t option depends on your CSV field separator (typically a semicolon). See the rest on the manpage of join. If you need to trim down duplicate columns later on, use cut
for that.
awk -F"," '{print $1","$2","$3",9999"}' f1.csv > newFile;
awk -F"," '{print $1","$2",9999,"$3}' f2.csv >> newFile
let me explain whats happening here -F"," specifies a comma as the field-separator.
for the missing column i replaced with the text 9999 you can replace with whatever you like. the firs command is redirecting stdout to a file called 'newFile' and the second command is appending stdout to the same file.
I hope this helps, your question was not to clear with what you wanted to do with the missing field from each file.
精彩评论