Skip to content

[Tool] Dataset Forge: The Local All-in-One Dataset Swiss Army Knife

Notifications You must be signed in to change notification settings

patlegu/dataset_forge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚒️ Dataset Forge

The All-in-One Local GUI for AI Dataset Preparation. Powered by Microsoft Florence-2 (Native Integration)

Dataset Forge Banner

📖 Overview

Dataset Forge is a standalone, local Python application designed to streamline the creation of image datasets for AI training (LoRA, Checkpoints, etc.).

It started as a fork of Joschek's tools but evolved into a complete suite focused on video extraction, smart processing, and independence. Unlike other tools, Dataset Forge removes complex dependencies (no Docker, no Ollama) and runs Microsoft's Florence-2 vision models natively via Hugging Face Transformers.

✨ Key Features

1. 🧠 Native Florence-2 Engine

  • No external APIs: Runs locally using transformers and torch.
  • Dual Models: Support for Florence-2-Base (Speed) and Florence-2-Large (Accuracy).
  • Optimized: Runs in FP16 mode to save VRAM and includes a custom patch for SDPA (Scaled Dot Product Attention) to avoid "Flash Attention" installation headaches on Windows.

2. 🎬 Robust Video Extractor & Repair Station

  • Integrated Browser: Browse video folders directly in the app.
  • Smart Player: Frame-by-frame navigation to find perfect poses without motion blur.
  • Auto-Fix (Anti-Crash): Automatically detects and crops videos with odd dimensions (e.g., 1919x1080) to prevent swscaler errors and FFmpeg crashes.
  • Batch Extraction: Extract all frames or snapshot specific moments.
  • Smart Player: Frame-by-frame navigation to find perfect poses.
  • Repair Tool (New): A built-in "Repair / Fix Video" button that uses FFmpeg to re-encode broken videos (odd dimensions, bad codecs) into clean H.264 MP4s.
  • Batch Fix (New): A "Batch Fix Folder" tool that scans an entire directory and automatically repairs all videos with odd dimensions (e.g., 1919x1080 -> 1918x1080) in one click.
  • Smart Swap: Automatically replaces broken files with fixed versions in your workflow seamlessly.

3. ✂️ Smart Cropping & Bucketing

  • Text-to-Crop: Describe the object ("face", "hand", "dog") and the AI will find and crop it.
  • Bucket-Friendly: Includes a "Bucket Resize (Mod 64)" mode. It ensures all cropped images have dimensions divisible by 64 (e.g., 1024x768), which is critical for optimal SDXL and Pony training.

4. 🏷️ Batch Captioning & Editing

  • Trigger Words: Auto-inject prompts (Prefix/Suffix) into captions.
  • Manual Editor: A split-view Quality Control station with keyboard shortcuts (Ctrl+S, Alt+Arrows) for rapid review.

5. 🏷️ Captioning & ComfyUI Inspector

  • Batch Captioning: Auto-inject Trigger Words (Prefix/Suffix) into captions.
  • ComfyUI Metadata Inspector: Open any PNG generated by ComfyUI to visualize the full Workflow (Graph) and Prompt (API).
  • Export Workflow: One-click export of the embedded workflow to a .json file that you can drag & drop back into ComfyUI.

⚙️ Installation

Prerequisites

  • Python 3.10 or newer.
  • A GPU with CUDA support (NVIDIA) is highly recommended.
  • Git installed.

Quick Start

  1. Clone the repository:
    git clone https://github.com/patlegu/dataset_forge.git
    cd dataset-forge
  2. Create a virtual environment (Recommended):
    python -m venv venv
    source venv/bin/activate
  1. Install Dependencies: Important note for installation (CUDA):
    If you want to use your NVIDIA graphics card (which is recommended for Florence-2),
    it's best to install PyTorch separately before the others; otherwise, pip might install the CPU version by default.

The ideal command to install everything at once (Windows/Linux): Pytorch can be found at https://download.pytorch.org/whl/cu118

  pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
  pip install -r requirements.txt
  (Note: timm and einops are required for Florence-2 model loading).
  1. Run the App:
python dataset_forge.py

🛠️ Usage Guide

Video Extractor:

  • Select a source folder containing videos.
  • Repair: If a video fails to play or has odd dimensions (e.g. 675px height),
    click "REPAIR (Single)" or use "BATCH FIX FOLDER" to fix all videos at once.
  • Use the slider or < > buttons to navigate.
  • Click SNAPSHOT to save the current frame, or EXTRACT ALL to dump the whole video frames.

Video Extractor Interface

Smart Cropping:

  • Choose your extracted frames folder as Input.
  • Set a target prompt (e.g., "face").
  • Select "Bucket Resize (Mod 64)" for training prep.
    Video Extractor Interface

Batch Captioning:

  • Point to your images.
  • Select <DETAILED_CAPTION> (Best for SDXL/Pony).
  • Add your Trigger Word in "Prefix" (e.g., ohwx man).
    Video Extractor Interface

Manual Edit & Metadata

  • Review: Edit text files quickly with Ctrl+S (Save) and Alt+Arrows (Navigate).
  • Inspector: If an image comes from ComfyUI, the "METADATA" button will light up.
    Click it to view generation params or export the workflow JSON. Video Extractor Interface

🐞 Troubleshooting & Known Issues

Here are common issues we encountered during development and how they are handled:

1. "FFmpeg Error" or Repair Fails

Cause: The app uses a system call to FFmpeg to guarantee valid H.264 encoding. Solution: Ensure you have FFmpeg installed. On WSL/Debian, run: sudo apt update && sudo apt install ffmpeg On Windows, verify that typing ffmpeg in a terminal works.

2. [swscaler @ ...] Slice parameters 0, xxx are invalid

Context: This happens when trying to display or process a video with odd dimensions (e.g., width is 1279px instead of 1280px). FFmpeg/OpenCV hates odd numbers for YUV conversion.

Solution: Dataset Forge automatically detects this and crops 1 pixel from the right or bottom on-the-fly. You don't need to do anything; the preview and extraction will work seamlessly.

3. "Flash Attention" Warnings

** Context**: You might see warnings in the console about flash_attn not being installed. Solution: This is normal. We intentionally patched the model to use standard PyTorch attention (SDPA) to make installation easier on Windows. Performance is still excellent.

4. First Launch is Slow

Context: The app freezes for a moment when clicking "Load Engine". Solution: The first time you run it, the app downloads the Florence-2 models (approx 1-3GB) from Hugging Face.
Check your console/terminal to see the download progress.

📜 License & Credits Author: Patlegu
Original Concept: Based on Joschek's tools.
Model: Uses Microsoft Florence-2.
Project released under MIT License.

About

[Tool] Dataset Forge: The Local All-in-One Dataset Swiss Army Knife

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages