📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
-
Updated
Feb 21, 2026 - Python
📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
[Pedestron] Generalizable Pedestrian Detection: The Elephant In The Room. @ CVPR2021
Data Labeling, Tracking and Annotation with AI
Simplify Your Visual Data Ops. Find and visualize issues with your computer vision datasets such as duplicates, anomalies, data leakage, mislabels and others.
Curate, evaluate, and ship LLM datasets from any document.
LoRA Pilot is an ultimate docker image for all Stable Diffusion LoRA trainers. Includes kohya_ss, diffusion pipes and TensorBoard for trainings and ComfyUI and InvokeAI for validation. Features shared models, modules, custom integrations and automatization scripts.
Anonymize sensitive data in your datasets.
(Windows/Linux) Local WebUI for finetuning, evaluation and generation of neural network models (LLM and StableDiffusion) on python (In Gradio interface). Translated on 3 languages
Chrome extension to download images with one click, saving time on image dataset creation.
Official Code for the dataset exploration of Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods
MALVADA: Malware Execution Traces Dataset generation.
kg-import automates the ingestion of heterogeneous datasets into a Knowledge Graph.
Make AVADataset custom dataset.
Utilities for working with the Common Voice dataset
Pokemon card automatic images downloader
Low Resource Context Relation Sampler for contexts with relations for fact-checking and fine-tuning your LLM models, powered by AREkit
This project contains various scripts that can assist in the process of preparing datasets.
NanoListener: a small suite of Python scripts to create custom training datasets for modification-aware basecaller models.
A GUI application to tag images and edit them using a editor.
Add a description, image, and links to the datasets-preparation topic page so that developers can more easily learn about it.
To associate your repository with the datasets-preparation topic, visit your repo's landing page and select "manage topics."