it describes scene using text
glove.6B - https://nlp.stanford.edu/data/glove.6B.zip
Data
-
flicker 8k dataset(images and descriptions) - https://www.kaggle.com/adityajn105/flickr8k/download
-
encode_test_feature.pkl, encode_test_feature.pkl, idx_to_word.pkl, word_to_idx.pkl - https://www.dropbox.com/sh/az6ii75fwxgr4sg/AAChRB_q_0OvmVmfPgyR6Dima?dl=0
model_weights
- 2048_img_model.h5, model_9.h5 - https://www.dropbox.com/sh/n9m8mxd2aojrnac/AAD-7B2pdQzLIA39tP1ToSVoa?dl=0
You can refer this blog to understand in brief Image Captioning with Keras by Harshall Lamba - https://towardsdatascience.com/image-captioning-with-keras-teaching-computers-to-describe-pictures-c88a46a311b8