How to find files with same size?_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-04-08 17:36 出处：网络

I have a file structure like so a/file1 a/file2 a/file3 a/... b/file1 b/file2 b/file3 b/... ... where within each dir, some files have the same file size, and I would like to delete those.

相关专题：bash

I have a file structure like so

a/file1
a/file2
a/file3
a/...
b/file1
b/file2
b/file3
b/...
...

where within each dir, some files have the same file size, and I would like to delete those.

I guess if the problem co开发者_如何学编程uld be solved for one dir e.g. dir a, then I could wrap a for-loop around it?

for f in *; do
???
done

But how do I find files with same size?

 ls -l|grep '^-'|awk '{if(a[$5]){ a[$5]=a[$5]"\n"$NF; b[$5]++;} else a[$5]=$NF} END{for(x in b)print a[x];}'

this will only check files, no directories.

$5 is the size of ls command

test:

kent@ArchT60:/tmp/t$ ls -l
total 16
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 a
-rw-r--r-- 1 kent kent 153 Sep 24 22:24 all
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 b
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 c
kent@ArchT60:/tmp/t$ ls -l|grep '^-'|awk '{if(a[$5]){ a[$5]=a[$5]"\n"$NF; b[$5]++;} else a[$5]=$NF} END{for(x in b)print a[x];}'
a
b
c
kent@ArchT60:/tmp/t$

update based on Michał Šrajer 's comment:

Now filenames with spaces are also supported

command:

 ls -l|grep '^-'|awk '{ f=""; if(NF>9)for(i=9;i<=NF;i++)f=f?f" "$i:$i; else f=$9; 
        if(a[$5]){ a[$5]=a[$5]"\n"f; b[$5]++;} else a[$5]=f}END{for(x in b)print a[x];}'

test:

kent@ArchT60:/tmp/t$ l
total 24
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 a
-rw-r--r-- 1 kent kent 153 Sep 24 22:24 all
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 b
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 c
-rw-r--r-- 1 kent kent  51 Sep 24 22:40 x y

kent@ArchT60:/tmp/t$ ls -l|grep '^-'|awk '{ f=""
        if(NF>9)for(i=9;i<=NF;i++)f=f?f" "$i:$i; else f=$9; 
        if(a[$5]){ a[$5]=a[$5]"\n"f; b[$5]++;} else a[$5]=f} END{for(x in b)print a[x];}'
a
b
c
x y

kent@ArchT60:/tmp/t$

Solution working with "file names with spaces" (based on Kent (+1) and awiebe (+1) posts):

for FILE in *; do stat -c"%s/%n" "$FILE"; done | awk -F/ '{if ($1 in a)print $2; else a[$1]=1}' | xargs echo rm

to make it remove duplicates, remove echo from xargs.

Here is code if you need the size of a file:

FILESIZE=$(stat -c%s "$FILENAME")
echo "Size of $FILENAME = $FILESIZE bytes."

Then use a for loop to get the first item in your structure, Store the size of that file in a variable.

Nest a for loop in that for loop to each item in your structure(excluding the current item) to the current item.

Route all the names of identical files into a text file to ensure you have written you script correctly(insteed of executing rm immediately) .

Execute rm on the contents of this file.

Based on the accepted answer, the following provides a list of all the files of the same size in the current directory (so you can choose which one to keep), sorted by size:

for FILE in *; do stat -c"%s/%n" "$FILE"; done | awk -F/ '{if ($1 in a)print a[$1]"\n"$2; else a[$1]=$2}' | sort -u | tr '\n' '\0' | xargs -0 ls -lS

To determine if the files are actually the same, not just the contain the same number of bytes, do an shasum or md5sum on each file:

for FILE in *; do stat -c"%s/%n" "$FILE"; done | awk -F/ '{if ($1 in a)print a[$1]"\n"$2; else a[$1]=$2}' | sort -u | tr '\n' '\0' | xargs -0 -n1 shasum

Plain bash solution

find -not -empty -type f -printf "%s\n" | 
sort -rn | uniq -d | 
xargs -I{} -n1 find -type f -size {}c -print0 | 
xargs -0 du | sort

Looks like what you really want is a duplicate file finder?

It sounds like this has been answered several times and in several different ways, so I may be beating a dead horse but here goes...

find DIR_TO_RUN_ON -size SIZE_OF_FILE_TO_MATCH -exec rm {} \;

find is an awesome command and I highly recommend reading its manpage.

How to find files with same size?

精彩评论

关注公众号

热门标签

图文推荐

How to find files with same size?

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：