finding and comparing files

Fdupes

fdupes -r jellyfin /mnt/terrahdd1/jellyfin

ChatGPT says: "If you need a faster alternative to fdupes for identifying duplicate files, here are some tools and methods that are optimized for speed:"


1. rmlint

rmlint is a high-performance tool specifically designed to find duplicate files (and other "lint" like empty files) quickly.

Install:

sudo apt install rmlint

Usage:

rmlint /path/to/directory1 /path/to/directory2

Output:

  • Lists duplicate files and optionally generates scripts to delete or handle them.

Why it's fast?

  • Uses hashing algorithms and multithreading for performance.
  • Skips unnecessary I/O operations with advanced optimizations.

2. jdupes

jdupes is a fork of fdupes with significant performance improvements.

Install:

sudo apt install jdupes

Usage:

jdupes -r /path/to/directory1 /path/to/directory2

Features:

  • Multithreaded for speed.
  • Supports advanced options like hard linking and deduplication.

Performance Comparison:

jdupes is generally 2–10 times faster than fdupes, especially for large datasets.


3. dupeGuru

dupeGuru is a GUI and CLI-based duplicate finder optimized for speed and flexibility.

Install:

sudo apt install dupeguru

Usage (CLI):

dupeguru -d /path/to/directory1 /path/to/directory2

Why it's fast?

  • Uses smart algorithms to compare file hashes and metadata.
  • Skips unnecessary comparisons based on file size and name before hashing.

4. Custom hashsum Script

For absolute control and minimal overhead, you can write a custom script to hash files and compare results.

Script:

find /path/to/directory1 /path/to/directory2 -type f -exec sha256sum {} + | sort | uniq -D -w 64

Explanation:

  • sha256sum: Calculates hashes for all files.
  • sort: Sorts the hash results for efficient comparison.
  • uniq -D -w 64: Shows only duplicate hashes (first 64 characters, i.e., the hash).

Performance:

This method avoids scanning metadata and directly works on content. Use md5sum for slightly faster hashing at the cost of weaker checksum robustness.


5. czkawka (Rust-based, blazingly fast)

czkawka is a Rust-based tool designed for fast file deduplication and cleaning.

Install:

Download the binary from the Czkawka GitHub releases page.

Usage:

czkawka dup /path/to/directory1 /path/to/directory2

Why it's the fastest?

  • Written in Rust, optimized for performance and memory usage.
  • Multithreaded with efficient file comparisons using content hashing.

Comparison of Speed:

Tool Speed Features
rmlint Fastest Advanced deduplication and cleanup
jdupes Very Fast CLI-only, simple and effective
dupeGuru Fast GUI + CLI, flexible matching
czkawka Blazingly Fast Rust-based, modern
fdupes Moderate Simple, widely available

Recommendation:

  • For CLI simplicity and speed: Use rmlint or jdupes.
  • For the fastest solution: Use czkawka.