📦 A curated list of JSON / BSON datasets from the web in order to practice / use in MongoDB
-
Updated
Jul 5, 2019 - Shell
📦 A curated list of JSON / BSON datasets from the web in order to practice / use in MongoDB
[ICLR 2024] DNABERT-2: Efficient Foundation Model and Benchmark for Multi-Species Genome
Visual Odometry with Inertial and Depth (VOID) dataset
[NeurIPS 2025] PanTS: The Pancreatic Tumor Segmentation Dataset. PanTS enables development and external evaluation of AI for pancreatic tumor detection, localization, and quantitative assessment, with multi-structure context and metadata to support robust, anatomy-aware modeling.
🐸TTS recipes for different datasets
mirror of VoxCeleb dataset - a large-scale speaker identification dataset
ODSQA: OPEN-DOMAIN SPOKEN QUESTION ANSWERING DATASET
[ICCV 2025] Dataset of 10,135 abdominal CT scans with 15,130 tumors annotated across six organs and 5,893 controls. The AI ranks first in Medical Segmentation Decathlon (MSD).
(3DV 2021) A High-fidelity 128-channel LiDAR Dataset with Panoramic Ambient and Reflectivity Imagery for Multi-modal Autonomous Driving Applications
A spoken question answering dataset on SQUAD
International Securities Identification Numbers for various Indian Securities
A dataset of SCP Items, Articles, and Metadata - Updated Daily
Tracing Versus Freehand for Evaluating Computer-Generated Drawings (SIGGRAPH 2021)
Collections of many datasets you may need and play with.
Add a description, image, and links to the dataset topic page so that developers can more easily learn about it.
To associate your repository with the dataset topic, visit your repo's landing page and select "manage topics."