Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.
-
Updated
Feb 18, 2023
Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.
Dataset and Evaluation Scripts for Obstacle Detection via Semantic Segmentation in a Marine Environment
This study introduces MultiBanFakeDetect, a novel multimodal dataset for Bangla fake news detection, combining textual and visual information. It features TextFakeNet for text analysis and MultiFusionFake for integrating multimodal data.
Wearanize+ is a research project in which multiple wearable devices were used to record participants' overnight sleep
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian (Bahasa Indonesia).
The official code of "Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search"
Add a description, image, and links to the multimodal-dataset topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-dataset topic, visit your repo's landing page and select "manage topics."