SkillHub

dataset-splitter

v1.0.0

将图像数据集划分为训练集、验证集和测试集,支持随机或分层划分、自定义比例及标注。

Sourced from ClawHub, Authored by Mingo_318

Installation

Please help me install the skill `dataset-splitter` from SkillHub official store. npx skills add Mingo-318/dataset-splitter

Dataset Splitter

Split image datasets into train/val/test sets. Supports random split, stratified split, and custom ratios. Use when user needs to split dataset for machine learning training.

Features

  • Random Split: Randomly shuffle and split
  • Stratified Split: Maintain class distribution
  • Custom Ratios: Configurable train/val/test ratios
  • Annotation Support: Split images and corresponding annotations together
  • YOLO Format: Generate YOLO format dataset structure
  • Reproducible: Set random seed for reproducibility

Usage

# Simple split (80/10/10)
python scripts/splitter.py split /path/to/images/ --ratios 80 10 10

# With annotations
python scripts/splitter.py split /path/to/images/ --annotations /path/to/labels/

# YOLO format output
python scripts/splitter.py split /path/to/images/ --output /path/to/dataset/ --yolo

# Stratified by class
python scripts/splitter.py split /path/to/images/ --annotations labels/ --stratify

Examples

$ python scripts/splitter.py split ./images --ratios 80 10 10

Splitting dataset...
Total images: 1000
Train: 800 (80%)
Val: 100 (10%)
Test: 100 (10%)

✓ Created train/ (800 images)
✓ Created val/ (100 images)
✓ Created test/ (100 images)

Installation

pip install pillow

Options

  • --ratios: Split ratios (train val test), default: 80 10 10
  • --seed: Random seed for reproducibility
  • --annotations: Path to annotations (will be split together)
  • --output: Output directory
  • --yolo: Output in YOLO dataset format
  • --stratify: Maintain class distribution
  • --copy: Copy files instead of moving