开发者

What is the most efficient way to subtract one list from another?

开发者 https://www.devze.com 2023-04-09 10:12 出处:网络
I am trying to subtract List_1 (50k lines) from List_2 (100k lines) , when an item in List_1 is an exact match for an item in List_2. I am using grep, specifically:

I am trying to subtract List_1 (50k lines) from List_2 (100k lines) , when an item in List_1 is an exact match for an item in List_2. I am using grep, specifically:

grep -v -f List_1.csv List_2.csv > Magic_L开发者_如何学Pythonist.csv

I know this is not the most efficient way to do this, but what is? sed? awk? comm? SQL? How might I accomplish this in the most efficient way possible?


This is one of the most efficient ways IMHO, you need to add -F though:

grep -Fvf List_1.csv List_2.csv > Magic_List.csv


Most efficient way is to use a trie data structure or a hash function for the 2nd list and for each item in the first list search in your trie.


You'd have to benchmark it to find the most efficient method. This is, however, what comm is for, so I'd guess it would be a pretty tool.

comm -13 List_1.csv List_2.csv > Magic_List.csv
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号