filesieve is a command-line utility for finding exact duplicate files and
moving duplicate copies into an alternate directory while leaving one canonical
original in place.
It is optimized for large media collections with:
- staged exact hashing (size filter -> quick hash -> full hash -> byte verify),
- optional perceptual media similarity clustering (images + video),
- persistent SQLite signatures cache for faster repeated runs.
- Walks one or more base directories recursively.
- Moves only exact byte-identical duplicates.
- Keeps canonical file by oldest
mtime_ns, then lexicographic path. - Emits perceptual media clusters as report-only output (no auto-move).
See Duplicate detection algorithm for details.
filesieve currently supports Python 3.10+.
uv syncuv tool install .After install, the filesieve command is available in your shell.
General form:
filesieve [OPTIONS] BASE_DIR [BASE_DIR ...]-a, --alternate DUP_DIR: move exact duplicates here.-c, --config FILE: optional config path.--mode {exact,media}: duplicate mode (mediadefault).--cache PATH: SQLite cache path override.--no-cache: disable persistent cache.--hash-workers N: worker threads for exact hashing.--media-workers N: worker threads for perceptual media stage.--ffmpeg PATH: explicitffmpegpath or executable name.--ffprobe PATH: explicitffprobepath or executable name.--report-similar PATH: write perceptual media clusters JSON.
Exact duplicate cleanup only:
filesieve --mode exact --alternate /tmp/sieve/dups ~/VideosMedia mode with report output:
filesieve --mode media --report-similar ./similar.json --alternate /tmp/sieve/dups ~/Photos ~/VideosRun through uv:
uv run filesieve --alternate /tmp/sieve/dups ./libraryFor full details:
filesieve --helpfilesieve also supports organizing video media into a Plex-friendly structure with idempotent state tracking, duplicate routing, dry-run previews, and optional native UI.
CLI dry-run example:
filesieve --organize-media --organize-target /media/library --organize-report ./organize-report.json ~/Downloads ~/StagingApply changes:
filesieve --organize-media --organize-apply --organize-target /media/library ~/Downloads ~/StagingOpen native UI (source selection, target configuration, progress, pause/continue/stop):
filesieve --organize-ui --organize-target /media/libraryOrganizer options:
--organize-config PATH: YAML organizer config (seeconfig/organize.yaml).--organize-state-db PATH: SQLite state for idempotency and re-runs.--organize-apply: apply moves (default is dry-run).--organize-report PATH: write JSON report of planned/executed operations.
Behavior notes:
- Non-media files are ignored.
- Unknown media naming falls into
Unsorted. - Duplicates are moved to
Duplicatesand canonical picks highest parsed quality. - On destination conflicts, version suffixes are appended.
- On Windows, cross-drive moves use copy+verify+delete for safety.
Pass a config with --config /path/to/sieve.conf.
Precedence order:
- CLI args
- config file values
- in-code defaults
Example config:
[global]
dup_dir:/tmp/sieve/dups
mode:media
cache_db:.filesieve-cache.sqlite
hash_workers:8
media_workers:2
[media]
enabled:true
image_hamming_threshold:8
video_hamming_threshold:32
video_frame_hamming_threshold:12
duration_bucket_seconds:2- File moves happen only for exact duplicates after byte-for-byte verification.
- Perceptual matches are advisory in
similar_media_candidatesoutput only. - If FFmpeg tools are unavailable, perceptual stage is skipped automatically.