-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdups
executable file
·63 lines (43 loc) · 1.29 KB
/
dups
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
#!/usr/bin/perl
# dups: simple script for showing duplicate files
=head1 NAME
dups - Show Duplicate Files
=head1 SYNOPSIS
Usage: dups files ...
dups is a fast script for discovering duplicate files. It
achieves its efficiency by comparing file digests rather than the
file contents themselves, the latter being much larger in general.
The NIST Secure Hash Algorithm (SHA) is highly collision-resistant,
meaning that two files with the same SHA digest have an almost
certain probability of being identical.
The dups script works by computing the SHA-1 digest of each file
and looking for matches. The search can reveal more than one set
of duplicates, so the output is written as follows:
match1_file1
match1_file2
match1_file3
etc.
match2_file1
match2_file2
etc.
=head1 AUTHOR
Mark Shelor <[email protected]>
=head1 SEE ALSO
Perl module L<Digest::SHA> or L<Digest::SHA::PurePerl>
=cut
use strict;
use Digest::SHA;
die "usage: dups files ...\n" unless @ARGV;
my @files = grep { -f $_ } @ARGV;
my %dups;
for my $file (@files) {
my $digest = Digest::SHA->new->addfile($file, "b")->hexdigest;
push(@{$dups{$digest}}, $file);
print "$file\t\t$digest\n";
}
for (keys %dups) {
my $ref = $dups{$_};
if (scalar(@$ref) > 1) {
print join("\n\tduplicate ", @$ref), " - SHA $_\n\n";
}
}