How to join 2 csv files with a shell script?_问答_开发者

How to join 2 csv files with a shell script?

开发者 https://www.devze.com 2023-03-12 09:10 出处：网络

I\'m trying to make a shell script that will combine two csv files in the follo开发者_高级运维wing way:

相关专题：bash scripting

I'm trying to make a shell script that will combine two csv files in the follo开发者_高级运维wing way:

I have two csv files, f1.csv and f2.csv. The format of f1.csv is:

startId, endId, roomNum

f2.csv has a format like this:

startId, endId, teacherId

I want to combine these two into one csv file with this format:

startId, endId, roomNum, teacherId.

What is the best way to accomplish this with a shell script that runs under Linux?

Try:

join -t, -1 1 -2 1 -o 1.2 1.3 1.4 2.4 <(awk -F, '{print $1":"$2","$0}' f1.csv | sort) <(awk -F, '{print $1":"$2","$0}' f2.csv | sort)

How it works:

1) I first create a composite key column, by joining the startId and endId into startId:endId for both files.

awk -F, '{print $1":"$2","$0}' f1.csv
awk -F, '{print $1":"$2","$0}' f2.csv

2) I sort both outputs:

awk -F, '{print $1":"$2","$0}' f1.csv | sort 
awk -F, '{print $1":"$2","$0}' f2.csv | sort

3) I then use the join command to join on my composite key (in the first column) and output just the columns I need.

Use join -t ';' to combine the corresponding lines. The parameter to the -t option depends on your CSV field separator (typically a semicolon). See the rest on the manpage of join. If you need to trim down duplicate columns later on, use cut for that.

awk -F"," '{print $1","$2","$3",9999"}' f1.csv > newFile;
awk -F"," '{print $1","$2",9999,"$3}' f2.csv >> newFile

let me explain whats happening here -F"," specifies a comma as the field-separator.

for the missing column i replaced with the text 9999 you can replace with whatever you like. the firs command is redirecting stdout to a file called 'newFile' and the second command is appending stdout to the same file.

I hope this helps, your question was not to clear with what you wanted to do with the missing field from each file.