开发者

diff folders recursively vs. multithreading

开发者 https://www.devze.com 2023-03-30 05:28 出处:网络
I need to compare two directory structures w开发者_开发知识库ith around one billion files each (directory deepness up to 20 levels)

I need to compare two directory structures w开发者_开发知识库ith around one billion files each (directory deepness up to 20 levels)

I found usual diff -r /location/one /location/two slow.

Is there any implementation of multithreading diff? Or is it doable via combining shell and diff together? If so, how?


Your disk is gonna be the bottleneck.

Unless you are working on tmpfs, you will probably only loose speed. That said:

find -maxdepth 1 -type d -print0 |
    xargs -0P4 -n1 -iDIRNAME diff -EwburqN "DIRNAME/" "/tmp/othertree/DIRNAME/"

should do a pretty decent job of comparing trees (in this case . to /tmp/othertree).

It has a flaw right now, in that it won't detect toplevel directories in otherthree that don't exist in .. I leave that as an exercise for the reader - though you could easily repeat the comparison in reverse

The argument -P4 to xargs specifies that you want at most 4 concurrent processes.

Also have look at the xjobs utitlity which does a better job at separating the output. I think with GNU xargs (like shown) you cannot drop the -q option because it will intermix the diffs (?).

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号