This repo hosts the source code (Torch) for our work on procedure segmentation and YouCook2 dataset
- The large-scale cooking video dataset YouCook2 is available at YouCook2 website
- Our AAAI18 oral paper is available here
- Install Lua Torch, which also contains packages such as nn, nngraph, cutorch etc.
- Install csvigo to read/write .csv files
- Download the YouCook2 dataset
We provide ResNet-34 feature for 500 uniformly sampled RGB frames per video (see dataset README). To extract feature on your own, follow the instructions: i) Adapt script/video2frame_yc2.sh
and script/videosample.py
to sample frames, ii) Run extract_cnnfeat_resnet_mscoco.lua
to extract feature for each frame.
train_bilstm_seq.lua
is the main file for training and validation. To load your data, specify the data paths -image_folder
, -train_data_folder
, -val_data_folder
and -ann_file
. You also need specify video info files at -train_vidinfo_file
and -val_vidinfo_file
. An example on model training:
th train_bilstm_seq.lua -id my_procnets -mp_scale_h 8 -mp_scale_w 5 -save_checkpoint_every 10000 -max_iters 120000 -learning_rate 4e-5
where the option -save_checkpoint_every
determines the frequency for validation. The metrics used in validation include mIoU and Jacc, and the model with the highest Jacc will be stored under directory -checkpoint_path
.
Note: training is slow with the current implementation (2 days on NVIDIA GTX 1080Ti) and can be further optimized. We actively welcome pull requests.
The model testing is integrated in the same script, so simply run:
th train_bilstm_seq.lua -id eval-my_procnets -mp_scale_h 8 -mp_scale_w 5 -max_iters 1 -start_from /path/to/your/model
Make sure you specify -val_data_folder
and -val_info_file
to the feature and duration info corresponding to the testing split.
We provide our pre-trained model (59MB). The Jacc and mIoU scores are shown below. To evaluate the model in terms of precision and recall, refer to script/eval_recall_precision.py
.
validation | test | |||
---|---|---|---|---|
Method | Jaccard | mIoU | Jaccard | mIoU |
ProcNets-LSTM | 55.3 | 40.9 | 51.5 | 38.0 |
We provide simple visualization of the generated segments, which can be triggered by setting -vis
to true
. Run script/plot_losses.py
to plot the training loss and validation accuracy.
Our code is mainly based on Neuraltalk2 and Facebook ResNet (thanks to both for releasing their code!). We are releasing a PyTorch version of ProcNets soon, please stay tuned!
Please contact luozhou@umich.edu if you have any trouble running the code. Please cite the following paper if you are using the code.
@inproceedings{ZhXuCoCVPR18,
author={Zhou, Luowei and Xu, Chenliang and Corso, Jason J},
title = {Towards Automatic Learning of Procedures From Web Instructional Videos},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2018},
url = {https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17344}
}