Code from the paper Learning to Learn Words from Visual Scenes
Website of the project in expert.cs.columbia.edu
If you use the code, please cite the paper as:
@Article{Suris2020learning,
author = {D. Surís and D. Epstein and H. Ji and S. Chang and C. Vondrick},
title = {Learning to Learn Words from Visual Scenes},
journal = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
An example of command line execution can be found in scripts/run.sh
. To reproduce the numbers from the paper, please use the released pretrained models, and the scripts/test_*.sh
scripts.
Run python main.py --help
for information on arguments.
Be sure to have the external libraries in requirements.txt installed.
We work with the Epic Kitchens dataset for this project. To run our code, you will need to download their images and annotations.
Specifically, the annotations directory has to contain:
- EPIC_train_object_labels.csv (provided by the Epic Kitchens dataset)
- EPIC_video_info.csv (provided by the Epic Kitchens dataset)
- splits.pth. File containing our train/test splits.
- processed_EPIC_train_action_labels
The path to this directory has to be introduced in the --annotation_root
argument. A compressed .tar.gz file with
these four files can be downloaded here.
The images directory has to be specified using --img_root
. It contains all the images with the following
subfolder structure: path_to_img_root/participant_id/vid_id/frame_{frame_id:010d}.jpg
. This is the default structure
if you download the data from the Epic Kitchens website
(download from here). For this project we only use
the RGB images, not flow information.
The pretrained models reported in our paper can be found in the following links:
- Bert baseline
- Model with isolated attention
- Model with target-to-reference attention
- Model with via-vision attention
- Model with via-vision attention and input pointing
- Model with full attention
Each one of these is a .tar.gz file containing the files necessary to load the model (checkpoint_best.pth, config.json and tokenizer.pth).
To resume training or to test from one of these pretrained models, set the --resume
to True.
Extract the models under the /path/to/your/checkpoints
directory you introduce in
the --checkpoint_dir
argument. Refer to the specific model using the --resume_name
argument.