Image captioning is a task in computer vision that involves teaching a deep learning model to describe an image or captioning an image. During training, the inputs to the RNN model are image features with their caption and during testing, only an image is considered as input to the model which will generate caption as an output.I have used three different methods to create an image captioning model. The models that are created using these three methods can be further fine-tuned by using different hyperparameters and training the model for a longer time.All three methods are created using Keras.Same dataset is used in all the three methods.
Dataset :-https://forms.illinois.edu/sec/1713398
- Imgcaption1 :- It contains the first method that I have used and only performed to understand the concepts of image captioning.
- Imgcaption2 :- It contains the second method that I have used and probably the most complicated one out of all three, although this method generates a pretty good result, the next implementation is easier than this and generates pretty good captions.
- Imgcaption3 :- This file contains the third and last method that I have used. The model is pretty straight forward and the results are almost as good as the second method.
You can download the vectors from this link :https://www.kaggle.com/incorpes/glove6b200d
NOTE : These models can be further fine-tuned to produce to better results and some of the results that i have generated are good and few are bad.