⚒️ Dataset Forge

The All-in-One Local GUI for AI Dataset Preparation. Powered by Microsoft Florence-2 (Native Integration)

📖 Overview

Dataset Forge is a standalone, local Python application designed to streamline the creation of image datasets for AI training (LoRA, Checkpoints, etc.).

It started as a fork of Joschek's tools but evolved into a complete suite focused on video extraction, smart processing, and independence. Unlike other tools, Dataset Forge removes complex dependencies (no Docker, no Ollama) and runs Microsoft's Florence-2 vision models natively via Hugging Face Transformers.

✨ Key Features

1. 🧠 Native Florence-2 Engine

No external APIs: Runs locally using transformers and torch.
Dual Models: Support for Florence-2-Base (Speed) and Florence-2-Large (Accuracy).
Optimized: Runs in FP16 mode to save VRAM and includes a custom patch for SDPA (Scaled Dot Product Attention) to avoid "Flash Attention" installation headaches on Windows.

2. 🎬 Robust Video Extractor & Repair Station

Integrated Browser: Browse video folders directly in the app.
Smart Player: Frame-by-frame navigation to find perfect poses without motion blur.
Auto-Fix (Anti-Crash): Automatically detects and crops videos with odd dimensions (e.g., 1919x1080) to prevent swscaler errors and FFmpeg crashes.
Batch Extraction: Extract all frames or snapshot specific moments.
Smart Player: Frame-by-frame navigation to find perfect poses.
Repair Tool (New): A built-in "Repair / Fix Video" button that uses FFmpeg to re-encode broken videos (odd dimensions, bad codecs) into clean H.264 MP4s.
Batch Fix (New): A "Batch Fix Folder" tool that scans an entire directory and automatically repairs all videos with odd dimensions (e.g., 1919x1080 -> 1918x1080) in one click.
Smart Swap: Automatically replaces broken files with fixed versions in your workflow seamlessly.

3. ✂️ Smart Cropping & Bucketing

Text-to-Crop: Describe the object ("face", "hand", "dog") and the AI will find and crop it.
Bucket-Friendly: Includes a "Bucket Resize (Mod 64)" mode. It ensures all cropped images have dimensions divisible by 64 (e.g., 1024x768), which is critical for optimal SDXL and Pony training.

4. 🏷️ Batch Captioning & Editing

Trigger Words: Auto-inject prompts (Prefix/Suffix) into captions.
Manual Editor: A split-view Quality Control station with keyboard shortcuts (Ctrl+S, Alt+Arrows) for rapid review.

5. 🏷️ Captioning & ComfyUI Inspector

Batch Captioning: Auto-inject Trigger Words (Prefix/Suffix) into captions.
ComfyUI Metadata Inspector: Open any PNG generated by ComfyUI to visualize the full Workflow (Graph) and Prompt (API).
Export Workflow: One-click export of the embedded workflow to a .json file that you can drag & drop back into ComfyUI.

⚙️ Installation

Prerequisites

Python 3.10 or newer.
A GPU with CUDA support (NVIDIA) is highly recommended.
Git installed.

Quick Start

Clone the repository:

git clone https://github.com/patlegu/dataset_forge.git
cd dataset-forge

Create a virtual environment (Recommended):

    python -m venv venv
    source venv/bin/activate

Install Dependencies: Important note for installation (CUDA):
If you want to use your NVIDIA graphics card (which is recommended for Florence-2),
it's best to install PyTorch separately before the others; otherwise, pip might install the CPU version by default.

The ideal command to install everything at once (Windows/Linux): Pytorch can be found at https://download.pytorch.org/whl/cu118

  pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
  pip install -r requirements.txt
  (Note: timm and einops are required for Florence-2 model loading).

Run the App:

python dataset_forge.py

🛠️ Usage Guide

Video Extractor:

Select a source folder containing videos.
Repair: If a video fails to play or has odd dimensions (e.g. 675px height),
click "REPAIR (Single)" or use "BATCH FIX FOLDER" to fix all videos at once.
Use the slider or < > buttons to navigate.
Click SNAPSHOT to save the current frame, or EXTRACT ALL to dump the whole video frames.

Smart Cropping:

Choose your extracted frames folder as Input.
Set a target prompt (e.g., "face").
Select "Bucket Resize (Mod 64)" for training prep.

Batch Captioning:

Point to your images.
Select <DETAILED_CAPTION> (Best for SDXL/Pony).
Add your Trigger Word in "Prefix" (e.g., ohwx man).

Manual Edit & Metadata

Review: Edit text files quickly with Ctrl+S (Save) and Alt+Arrows (Navigate).
Inspector: If an image comes from ComfyUI, the "METADATA" button will light up.
Click it to view generation params or export the workflow JSON.

🐞 Troubleshooting & Known Issues

Here are common issues we encountered during development and how they are handled:

1. "FFmpeg Error" or Repair Fails

Cause: The app uses a system call to FFmpeg to guarantee valid H.264 encoding. Solution: Ensure you have FFmpeg installed. On WSL/Debian, run: sudo apt update && sudo apt install ffmpeg On Windows, verify that typing ffmpeg in a terminal works.

2. [swscaler @ ...] Slice parameters 0, xxx are invalid

Context: This happens when trying to display or process a video with odd dimensions (e.g., width is 1279px instead of 1280px). FFmpeg/OpenCV hates odd numbers for YUV conversion.

Solution: Dataset Forge automatically detects this and crops 1 pixel from the right or bottom on-the-fly. You don't need to do anything; the preview and extraction will work seamlessly.

3. "Flash Attention" Warnings

** Context**: You might see warnings in the console about flash_attn not being installed. Solution: This is normal. We intentionally patched the model to use standard PyTorch attention (SDPA) to make installation easier on Windows. Performance is still excellent.

4. First Launch is Slow

Context: The app freezes for a moment when clicking "Load Engine". Solution: The first time you run it, the app downloads the Florence-2 models (approx 1-3GB) from Hugging Face.
Check your console/terminal to see the download progress.

📜 License & Credits Author: Patlegu
Original Concept: Based on Joschek's tools.
Model: Uses Microsoft Florence-2.
Project released under MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
images		images
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
dataset_forge.py		dataset_forge.py
requirements.txt		requirements.txt
update_changelog.py		update_changelog.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚒️ Dataset Forge

📖 Overview

✨ Key Features

1. 🧠 Native Florence-2 Engine

2. 🎬 Robust Video Extractor & Repair Station

3. ✂️ Smart Cropping & Bucketing

4. 🏷️ Batch Captioning & Editing

5. 🏷️ Captioning & ComfyUI Inspector

⚙️ Installation

Prerequisites

Quick Start

🛠️ Usage Guide

Video Extractor:

Smart Cropping:

Batch Captioning:

Manual Edit & Metadata

🐞 Troubleshooting & Known Issues

1. "FFmpeg Error" or Repair Fails

2. [swscaler @ ...] Slice parameters 0, xxx are invalid

3. "Flash Attention" Warnings

4. First Launch is Slow

About

Uh oh!

Releases

Packages

Languages

patlegu/dataset_forge

Folders and files

Latest commit

History

Repository files navigation

⚒️ Dataset Forge

📖 Overview

✨ Key Features

1. 🧠 Native Florence-2 Engine

2. 🎬 Robust Video Extractor & Repair Station

3. ✂️ Smart Cropping & Bucketing

4. 🏷️ Batch Captioning & Editing

5. 🏷️ Captioning & ComfyUI Inspector

⚙️ Installation

Prerequisites

Quick Start

🛠️ Usage Guide

Video Extractor:

Smart Cropping:

Batch Captioning:

Manual Edit & Metadata

🐞 Troubleshooting & Known Issues

1. "FFmpeg Error" or Repair Fails

2. [swscaler @ ...] Slice parameters 0, xxx are invalid

3. "Flash Attention" Warnings

4. First Launch is Slow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages