SkillHub

image-deduplicator

v1.0.0

使用感知哈希和MD5哈希检测并删除文件夹中的完全或相似重复图片,支持自定义相似度与操作。

Sourced from ClawHub, Authored by Mingo_318

Installation

Please help me install the skill `image-deduplicator` from SkillHub official store. npx skills add Mingo-318/image-deduplicator

Image Deduplicator

Find and remove duplicate or similar images in a folder using perceptual hashing. Use when user wants to clean up duplicate images, find near-duplicates, or deduplicate an image dataset.

Features

  • Exact Duplicates: Find images with identical content
  • Similar Images: Detect visually similar images (threshold configurable)
  • Hash-based: Fast MD5 hashing for exact duplicates
  • Perceptual Hash: pHash for finding similar images
  • Batch Processing: Process large image folders
  • Multiple Actions: List, delete, or move duplicates

Usage

# Find exact duplicates
python scripts/dedupe.py scan /path/to/images/

# Find similar images (90% similarity)
python scripts/dedupe.py scan /path/to/images/ --threshold 90

# Delete duplicates (keeps first occurrence)
python scripts/dedupe.py scan /path/to/images/ --action delete

# Move duplicates to a folder
python scripts/dedupe.py scan /path/to/images/ --action move --output /path/to/dupes/

Examples

$ python scripts/dedupe.py scan ./images/

Scanning images...
Found 150 images
Computing hashes...
Found 5 duplicate groups:

Group 1 (3 files):
  ./images/photo1.jpg
  ./images/photo1_copy.jpg
  ./images/photo1_final.jpg

Group 2 (2 files):
  ./images/screenshot.png
  ./images/screenshot (1).png

Total: 5 duplicate groups, 8 duplicate files

Installation

pip install pillow imagehash

Options

  • --threshold: Similarity threshold (0-100), default: 100 (exact)
  • --action: What to do with duplicates (list, delete, move)
  • --output: Output folder for --action move
  • --extensions: File extensions to scan (default: jpg,jpeg,png,bmp)