Harsh Jhamtani, Taylor Berg-Kirkpatrick. Learning to Describe Differences Between Pairs of Similar Images. EMNLP 2018 
Link: https://arxiv.org/pdf/1808.10584.pdf
- v0.1 of dataset is present in data/.  
 
- data/annotations/ contains threee json files representing train,val,test splits
 - format of each json file is as follows: each file represents a list. each item in the list is a dictionary consisting of 'img_id' and 'sentences' keys. e.g. 
{"img_id": "400", "sentences": ["two of the three people in the front of the picture have moved", "there is a vehicle in the far back that is only in image two"] 
- data/resized_images/ contains the relevant images.
 - naming convention: <img_id>.png, <img_id>_2.png
 - we have also provided the corresponding diff images: <img_id>_diff.jpg
 - All images have been resized to 224,224
 - Original size images: bit.ly/spot_diff_data
 
- We provide clusters of differing pixels computed under suggested paramter settings and clustering algorithm.
 - For more details, check Code/usage.ipynb
 
- Clustering code has been added
 
TODO
- Model Predictions (multi)
 
If you use the data or code, please consider citing
@inproceedings{jhamtani2018learning,
  title={Learning to Describe Differences Between Pairs of Similar Images},
  author={Jhamtani, Harsh and Berg-Kirkpatrick, Taylor},
  booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year={2018}
}