📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
-
Updated
Apr 30, 2026 - Python
📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
[Pedestron] Generalizable Pedestrian Detection: The Elephant In The Room. @ CVPR2021
Data Labeling, Tracking and Annotation with AI
LoRA Pilot is an ultimate docker image for all Stable Diffusion LoRA trainers. Includes kohya_ss, diffusion pipes and TensorBoard for trainings and ComfyUI and InvokeAI for validation. Features shared models, modules, custom integrations and automatization scripts.
Privacy first powerful, browser-based tool for tagging, captioning, cropping and managing training datasets for Stable Diffusion's LoRA trainings. No install required.
A media vizualizer and a tag manager captioner from small up to very large dataset of image AND video
Anonymize sensitive data in your datasets.
Chrome extension to download images with one click, saving time on image dataset creation.
(Windows/Linux) Local WebUI for finetuning, evaluation and generation of neural network models (LLM and StableDiffusion) on python (In Gradio interface). Translated on 3 languages
Official Code for the dataset exploration of Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods
MALVADA: Malware Execution Traces Dataset generation.
kg-import automates the ingestion of heterogeneous datasets into a Knowledge Graph.
Pokemon card automatic images downloader
Make AVADataset custom dataset.
Utilities for working with the Common Voice dataset
Low Resource Context Relation Sampler for contexts with relations for fact-checking and fine-tuning your LLM models, powered by AREkit
This project contains various scripts that can assist in the process of preparing datasets.
NanoListener: a small suite of Python scripts to create custom training datasets for modification-aware basecaller models.
Turn robot data into reproducible training & eval datasets
Add a description, image, and links to the datasets-preparation topic page so that developers can more easily learn about it.
To associate your repository with the datasets-preparation topic, visit your repo's landing page and select "manage topics."