Skip to content

dkujawski/filesieve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

filesieve

filesieve is a command-line utility for finding exact duplicate files and moving duplicate copies into an alternate directory while leaving one canonical original in place.

It is optimized for large media collections with:

  • staged exact hashing (size filter -> quick hash -> full hash -> byte verify),
  • optional perceptual media similarity clustering (images + video),
  • persistent SQLite signatures cache for faster repeated runs.

Project overview

  • Walks one or more base directories recursively.
  • Moves only exact byte-identical duplicates.
  • Keeps canonical file by oldest mtime_ns, then lexicographic path.
  • Emits perceptual media clusters as report-only output (no auto-move).

See Duplicate detection algorithm for details.

Supported Python versions

filesieve currently supports Python 3.10+.

Installation (with uv)

Install from local source

uv sync

Install as a tool (console entry point)

uv tool install .

After install, the filesieve command is available in your shell.

CLI usage

General form:

filesieve [OPTIONS] BASE_DIR [BASE_DIR ...]

Core options

  • -a, --alternate DUP_DIR: move exact duplicates here.
  • -c, --config FILE: optional config path.
  • --mode {exact,media}: duplicate mode (media default).
  • --cache PATH: SQLite cache path override.
  • --no-cache: disable persistent cache.
  • --hash-workers N: worker threads for exact hashing.
  • --media-workers N: worker threads for perceptual media stage.
  • --ffmpeg PATH: explicit ffmpeg path or executable name.
  • --ffprobe PATH: explicit ffprobe path or executable name.
  • --report-similar PATH: write perceptual media clusters JSON.

Examples

Exact duplicate cleanup only:

filesieve --mode exact --alternate /tmp/sieve/dups ~/Videos

Media mode with report output:

filesieve --mode media --report-similar ./similar.json --alternate /tmp/sieve/dups ~/Photos ~/Videos

Run through uv:

uv run filesieve --alternate /tmp/sieve/dups ./library

For full details:

filesieve --help

Media organizer mode (Plex preset)

filesieve also supports organizing video media into a Plex-friendly structure with idempotent state tracking, duplicate routing, dry-run previews, and optional native UI.

CLI dry-run example:

filesieve --organize-media --organize-target /media/library --organize-report ./organize-report.json ~/Downloads ~/Staging

Apply changes:

filesieve --organize-media --organize-apply --organize-target /media/library ~/Downloads ~/Staging

Open native UI (source selection, target configuration, progress, pause/continue/stop):

filesieve --organize-ui --organize-target /media/library

Organizer options:

  • --organize-config PATH: YAML organizer config (see config/organize.yaml).
  • --organize-state-db PATH: SQLite state for idempotency and re-runs.
  • --organize-apply: apply moves (default is dry-run).
  • --organize-report PATH: write JSON report of planned/executed operations.

Behavior notes:

  • Non-media files are ignored.
  • Unknown media naming falls into Unsorted.
  • Duplicates are moved to Duplicates and canonical picks highest parsed quality.
  • On destination conflicts, version suffixes are appended.
  • On Windows, cross-drive moves use copy+verify+delete for safety.

Configuration

Pass a config with --config /path/to/sieve.conf.

Precedence order:

  1. CLI args
  2. config file values
  3. in-code defaults

Example config:

[global]
dup_dir:/tmp/sieve/dups
mode:media
cache_db:.filesieve-cache.sqlite
hash_workers:8
media_workers:2

[media]
enabled:true
image_hamming_threshold:8
video_hamming_threshold:32
video_frame_hamming_threshold:12
duration_bucket_seconds:2

Safety model

  • File moves happen only for exact duplicates after byte-for-byte verification.
  • Perceptual matches are advisory in similar_media_candidates output only.
  • If FFmpeg tools are unavailable, perceptual stage is skipped automatically.

Additional documentation

About

Find and move duplicate files in a large directory tree

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors