The All-in-One Local GUI for AI Dataset Preparation. Powered by Microsoft Florence-2 (Native Integration)
Dataset Forge is a standalone, local Python application designed to streamline the creation of image datasets for AI training (LoRA, Checkpoints, etc.).
It started as a fork of Joschek's tools but evolved into a complete suite focused on video extraction, smart processing, and independence. Unlike other tools, Dataset Forge removes complex dependencies (no Docker, no Ollama) and runs Microsoft's Florence-2 vision models natively via Hugging Face Transformers.
- No external APIs: Runs locally using
transformersandtorch. - Dual Models: Support for
Florence-2-Base(Speed) andFlorence-2-Large(Accuracy). - Optimized: Runs in FP16 mode to save VRAM and includes a custom patch for SDPA (Scaled Dot Product Attention) to avoid "Flash Attention" installation headaches on Windows.
- Integrated Browser: Browse video folders directly in the app.
- Smart Player: Frame-by-frame navigation to find perfect poses without motion blur.
- Auto-Fix (Anti-Crash): Automatically detects and crops videos with odd dimensions (e.g., 1919x1080) to prevent
swscalererrors and FFmpeg crashes. - Batch Extraction: Extract all frames or snapshot specific moments.
- Smart Player: Frame-by-frame navigation to find perfect poses.
- Repair Tool (New): A built-in "Repair / Fix Video" button that uses FFmpeg to re-encode broken videos (odd dimensions, bad codecs) into clean H.264 MP4s.
- Batch Fix (New): A "Batch Fix Folder" tool that scans an entire directory and automatically repairs all videos with odd dimensions (e.g., 1919x1080 -> 1918x1080) in one click.
- Smart Swap: Automatically replaces broken files with fixed versions in your workflow seamlessly.
- Text-to-Crop: Describe the object ("face", "hand", "dog") and the AI will find and crop it.
- Bucket-Friendly: Includes a "Bucket Resize (Mod 64)" mode. It ensures all cropped images have dimensions divisible by 64 (e.g., 1024x768), which is critical for optimal SDXL and Pony training.
- Trigger Words: Auto-inject prompts (Prefix/Suffix) into captions.
- Manual Editor: A split-view Quality Control station with keyboard shortcuts (
Ctrl+S,Alt+Arrows) for rapid review.
- Batch Captioning: Auto-inject Trigger Words (Prefix/Suffix) into captions.
- ComfyUI Metadata Inspector: Open any PNG generated by ComfyUI to visualize the full Workflow (Graph) and Prompt (API).
- Export Workflow: One-click export of the embedded workflow to a
.jsonfile that you can drag & drop back into ComfyUI.
- Python 3.10 or newer.
- A GPU with CUDA support (NVIDIA) is highly recommended.
- Git installed.
- Clone the repository:
git clone https://github.com/patlegu/dataset_forge.git cd dataset-forge - Create a virtual environment (Recommended):
python -m venv venv
source venv/bin/activate- Install Dependencies:
Important note for installation (CUDA):
If you want to use your NVIDIA graphics card (which is recommended for Florence-2),
it's best to install PyTorch separately before the others; otherwise, pip might install the CPU version by default.
The ideal command to install everything at once (Windows/Linux): Pytorch can be found at https://download.pytorch.org/whl/cu118
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
(Note: timm and einops are required for Florence-2 model loading).- Run the App:
python dataset_forge.py- Select a source folder containing videos.
- Repair: If a video fails to play or has odd dimensions (e.g. 675px height),
click "REPAIR (Single)" or use "BATCH FIX FOLDER" to fix all videos at once. - Use the slider or < > buttons to navigate.
- Click SNAPSHOT to save the current frame, or EXTRACT ALL to dump the whole video frames.
- Choose your extracted frames folder as Input.
- Set a target prompt (e.g., "face").
- Select "Bucket Resize (Mod 64)" for training prep.

- Point to your images.
- Select <DETAILED_CAPTION> (Best for SDXL/Pony).
- Add your Trigger Word in "Prefix" (e.g., ohwx man).

- Review: Edit text files quickly with Ctrl+S (Save) and Alt+Arrows (Navigate).
- Inspector: If an image comes from ComfyUI, the "METADATA" button will light up.
Click it to view generation params or export the workflow JSON.
Here are common issues we encountered during development and how they are handled:
Cause: The app uses a system call to FFmpeg to guarantee valid H.264 encoding. Solution: Ensure you have FFmpeg installed. On WSL/Debian, run: sudo apt update && sudo apt install ffmpeg On Windows, verify that typing ffmpeg in a terminal works.
Context: This happens when trying to display or process a video with odd dimensions (e.g., width is 1279px instead of 1280px). FFmpeg/OpenCV hates odd numbers for YUV conversion.
Solution: Dataset Forge automatically detects this and crops 1 pixel from the right or bottom on-the-fly. You don't need to do anything; the preview and extraction will work seamlessly.
** Context**: You might see warnings in the console about flash_attn not being installed. Solution: This is normal. We intentionally patched the model to use standard PyTorch attention (SDPA) to make installation easier on Windows. Performance is still excellent.
Context: The app freezes for a moment when clicking "Load Engine".
Solution: The first time you run it, the app downloads the Florence-2 models (approx 1-3GB) from Hugging Face.
Check your console/terminal to see the download progress.
📜 License & Credits
Author: Patlegu
Original Concept: Based on Joschek's tools.
Model: Uses Microsoft Florence-2.
Project released under MIT License.

