CourseWork Project, Mathematics for Machine Learning
Teamates
- Adya Bhat
- Aditya G H
It uses a encoder-decoder architecture to caption Images. I used a pretrained ResNet model as the encoder to extract image features, combined with an LSTM to generate captions using the extracted features.
Model Architecture
Dataset
Flickr 8K - It consists of a folder Images with images (.jpg format), and a a text file captions.txt. Each image has five captions each.
containing the captions corresponding to the image file names.
The format of the text file is:
<image-filename>,<caption>\n
Results