Image Caption Generator project focuses on creating a Neural Network architecture to automatically generate captions from images.
The Microsoft Common Objects in COntext (MS COCO) dataset is a large-scale dataset for scene understanding. The dataset is commonly used to train and benchmark object detection, segmentation, and captioning algorithms.
Read more about the dataset on the website or in the research paper.
-
Exploring the Microsoft COCO Dataset: COC0
-
Data Preprocessing and Model Building: Jupyter Notebook
-
Model Training and Validation : Train and Validate
-
Results : Inference