🥤🧑🏻🚀Code and dataset for our EMNLP 2023 paper - "SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization"
-
Updated
Jan 23, 2026 - Python
🥤🧑🏻🚀Code and dataset for our EMNLP 2023 paper - "SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization"
Image Aesthetics Toolkit - includes Fisher Vector implementation, AVA (Image Aesthetic Visual Analysis) dataset and fast multi-threaded downloader
Python library for handling audio datasets.
Tools for dataset generation based on CARLA simulator. (Data Collector)
This package is a complete tool for creating a large dataset of images (specially designed -but not only- for machine learning enthusiasts). It can crawl the web, download images, rename / resize / covert the images and merge folders..
Crawler behind the Shopify App Marketplace dataset
Code to Blur Human Faces and Vehicle License Plates in Video and Images using a SoTA Object Detection model YOLOv8
Quickly and easily create / train a custom DeepDream model
Multi-Language Dataset Cleaner/Creator for Mozilla's DeepSpeech Framework
PixelPruner is a user-friendly image cropping app for AI-generated art. It supports PNG, JPG, JPEG, and WEBP formats. Easily crop, preview, and manage images with interactive previews, thumbnail views, rotation tools, and customizable output folders. Streamline your workflow and achieve perfect crops every time with PixelPruner.
A script to help you quickly build custom computer vision datasets
110k Dutch Book Reviews Dataset for Sentiment Analysis
A tool to streamline AI image captioning
A simple app for recording speech datasets.
Code for paper 'Avoid touching your face: A hand-to-face 3d motion dataset (covid-away) and trained models for smartwatches'
Dataset augmentation with Generative Adversarial Network for crop/weed segmentation
A Python library designed for scraping data from the SCP wiki.
Syclops is a tool for creating synthetic data from 3D virtual environments with photorealistic renderings and pixel-perfect annotations.
Use GPTparser with your OpenAI API to scrape & parse files into structured JSON files.
Costa Rican license plate dataset generator
Add a description, image, and links to the dataset-creation topic page so that developers can more easily learn about it.
To associate your repository with the dataset-creation topic, visit your repo's landing page and select "manage topics."