Skip to content

Commit

Permalink
init
Browse files Browse the repository at this point in the history
  • Loading branch information
AllenXuuu committed Apr 11, 2022
1 parent cc3ad1a commit 93a7ad0
Show file tree
Hide file tree
Showing 164 changed files with 615,473 additions and 3 deletions.
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,9 @@ __pycache__
.pth

# Data directories
data/feature
exp
weight
checkpoint

# python notebook related
*.ipynb_checkpoints
*.ipynb_checkpoints
79 changes: 78 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,71 @@ This repo contains the official implementation of paper
> [Xinyu Xu](https://xuxinyu.website), [Yong-Lu Li](https://dirtyharrylyl.github.io/), [Cewu Lu](https://mvig.sjtu.edu.cn).
>
> In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022.
> [[arxiv](https://arxiv.org/abs/2204.02587)] [[code](https://github.com/AllenXuuu/DCR)] [[model](https://drive.google.com/drive/folders/1bXFs1_9HBPi74LpsYfxx753Vkc6BbEHa?usp=sharing)]
****

## Code and models are coming soon!
## Data Prepare

We reorganize the annotation files of four datasets[1-4] in ```data``` folder.
You need to download pre-extracted feature into ```data/feature``` folder.
TSN feature can be downloaded from [Link](https://github.com/fpv-iplab/rulstm) [5].
irCSN-152 feature can be downloaded from [Link](https://github.com/facebookresearch/AVT) [6].
We provide a stronger TSM backbone, feature can be be downloaded from [Link](https://drive.google.com/drive/folders/1spwT8r7Fcm1fJJFju_L7NdyyHKckODNo?usp=sharing).


## Packages

We conduct experiments in the following environment
```
python == 3.9
torch == 1.9
torchvision == 0.10.0
apex
tensorboardX
yacs
pyyaml
numpy
prefetch_generator
```


## Evaluation

We release pre-trained models at [here](https://drive.google.com/drive/folders/1bXFs1_9HBPi74LpsYfxx753Vkc6BbEHa?usp=sharing).
To test the performance of our model with RGB-TSM backbone on EPIC-KITCHENS-100[1], you can run the following command.

```
python eval.py --cfg configs/EK100RGBTSM/eval.yaml --resume ./weights/EK100RGBTSM.pt
```

Here ```./weights/EK100RGBTSM.pt``` is the path to the pre-trained model you downloaded.

Results can be found in [Model Zoo](./docs/model_zoo.md).

## Training

Taking the same setting as an example, to reproduce the training process, you can run

```
python train_order.py --cfg configs/EK100RGBTSM/order.yaml --name order
python train.py --cfg configs/EK100RGBTSM/train.yaml --name train --resume exp/EK100RGBTSM/order/epoch_50.pt
```

The first line runs our frame order pre-training stage. The model will be stored in ```exp/EK100RGBTSM/order/epoch_50.pt```. The second line reloads the pre-trained model and runs the anticipation training stage.

Only to do the anticipation training from scratch is also possible by

```
python train.py --cfg configs/EK100RGBTSM/train.yaml --name train
```


## Citation

If you find our paper or code helpful, please cite our paper.

```
@inproceedings{xu2022learning,
title={Learning to Anticipate Future with Dynamic Context Removal },
Expand All @@ -21,3 +80,21 @@ This repo contains the official implementation of paper
year={2022}
}
```


## Reference

**[1]** Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Jian Ma, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, and Michael Wray. Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100. International Journal of Computer Vision (IJCV), 2021.

**[2]** Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, et al. Scaling egocentric vision: The epic-kitchens dataset. In Proceedings of the European Conference on Computer Vision (ECCV), pages 720–736, 2018.

**[3]** Yin Li, Miao Liu, and James M. Rehg. In the eye of beholder: Joint learning of gaze and actions in first person video. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018.


**[4]** Sebastian Stein and Stephen J McKenna. Combining embedded accelerometers with computer vision for recognizing food preparation activities. In Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing, pages 729–738, 2013.

**[5]** Antonino Furnari and Giovanni Farinella. Rolling-unrolling lstms for action anticipation from first-person video. IEEE transactions on pattern analysis and machine intelligence, 2020.


**[6]** Rohit Girdhar and Kristen Grauman. Anticipative Video Transformer. In ICCV, 2021.

154 changes: 154 additions & 0 deletions config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
import copy
import os
from yacs.config import CfgNode as CN
import argparse
import time

root = CN()

# root.name = '_'.join(time.asctime(time.localtime(time.time())).split())
root.name = None
root.local_rank = None
root.seed = 0
root.AMP_OPT_LEVEL = 'O1'
root.save_freq = None
root.main_metric = None


root.resume = CN()
root.resume.path = None
root.resume.pe_from_cls = True
root.resume.type = None

########################################## Curriculum
root.curriculum = CN()
root.curriculum.gamma_min = 0.9
root.curriculum.gamma_max = 0.98


########################################## TRAIN: DATA
root.train = CN()
root.train.report_iter = None
root.train.data = CN()

root.train.data.name = ""
root.train.data.split = ""
root.train.data.cache = False
root.train.data.drop = False
root.train.data.forward_frame = 0
root.train.data.past_frame = 0
root.train.data.fps = 4
root.train.data.tau_a = 1.0
root.train.data.batch_size = 8
root.train.data.num_workers = 8
root.train.data.weight = True


root.train.data.feat_file = ""
root.train.data.feature = ""
root.train.data.feature_fps = None
root.train.data.feature_dim = None

########################################## TRAIN: LEARNING
root.train.max_epoch = 100
root.train.clip_grad = 5.0

root.train.optimizer = CN()
root.train.optimizer.name = 'AdamW'
root.train.optimizer.base_lr = 1e-4
root.train.optimizer.momentum = 0.9
root.train.optimizer.betas = (0.9, 0.999)
root.train.optimizer.weight_decay = 1e-2
root.train.optimizer.eps = 1e-8

root.train.scheduler = CN()
root.train.scheduler.name = 'no'
root.train.scheduler.step = []
root.train.scheduler.gamma = 0.1
root.train.scheduler.eta_min = 0.
root.train.scheduler.warmup_epoch = 0

########################################## EVAL: DATA
root.eval = CN()
root.eval.freq = 100
root.eval.report_iter = None

root.eval.data = CN()
root.eval.data.name = ""
root.eval.data.split = ""
root.eval.data.cache = False
root.eval.data.drop = False
root.eval.data.forward_frame = 0
root.eval.data.past_frame = 0
root.eval.data.fps = 4
root.eval.data.tau_a = 1.0
root.eval.data.batch_size = 8
root.eval.data.num_workers = 8
root.eval.data.weight = False

root.eval.data.feat_file = ""
root.eval.data.feature = ""
root.eval.data.feature_fps = None
root.eval.data.feature_dim = None

########################################## MODEL: ARCH
root.model = CN()
root.model.name = None

root.model.feat_dim = 0
root.model.past_frame = 0
root.model.anticipation_frame = 4
root.model.action_frame = 4

root.model.reasoner = CN()
root.model.reasoner.name = None
root.model.reasoner.d_model = 512
root.model.reasoner.nhead = 1
root.model.reasoner.dff = 2048
root.model.reasoner.depth = 1
root.model.reasoner.dropout = 0.1
root.model.reasoner.pe_type = 'learnable'

root.model.classifier = CN()
root.model.classifier.action = False
root.model.classifier.verb = False
root.model.classifier.noun = False
root.model.classifier.hidden = []
root.model.classifier.dropout = 0.


########################################## MODEL: LOSS
root.model.loss = CN()
root.model.loss.name = 'CE'
root.model.loss.smooth = 0.
root.model.loss.sigma = None

## weight
root.model.loss.verb = 1.
root.model.loss.noun = 1.
root.model.loss.action = 1.
root.model.loss.next_cls = 0.
root.model.loss.feat_mse = 0.



def load_config(args= None):
config = copy.deepcopy(root)

if args is None:
return config

config.merge_from_file(args.cfg)

config.local_rank = args.local_rank
config.resume.type = args.weight_type
config.resume.path = args.resume

if args.name is not None:
config.name = args.name

_,folder,_ = args.cfg.split('/')
config.folder = os.path.join('exp',folder)


return config
52 changes: 52 additions & 0 deletions configs/EK100FLOWTSN/eval.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
main_metric : 'All_A'

eval:
freq : 1
data :
name : 'EPIC-KITCHENS-100'
split : 'valid'

feat_file : './data/feature/EK100_FLOW_TSN/'
feature : 'TSN'
feature_fps : 30
feature_dim : 1024

forward_frame : 8
past_frame : 40

fps : 4
batch_size : 128
num_workers : 2
cache : false


model:
name : 'DCR'

feat_dim: 1024
past_frame : 40
anticipation_frame : 4
action_frame : 4

reasoner:
name : 'transformer'
d_model : 1024
nhead : 16
dff : 4096
depth : 6
dropout : 0.1
pe_type : 'learnable'

classifier:
dropout: 0.4
action: true
verb: true
noun : true

loss:
name : 'CE'
verb : 0.5
noun : 0.5
feat_mse : 1.
next_cls : 1.
smooth: 0.1
71 changes: 71 additions & 0 deletions configs/EK100FLOWTSN/order.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
save_freq : 50

eval:
freq : 5
data :
name : 'EPIC-KITCHENS-100'
split : 'valid'

feat_file : './data/feature/EK100_FLOW_TSN/'
feature : 'TSN'
feature_fps : 30
feature_dim : 1024

forward_frame : 8
past_frame : 40

fps : 4
batch_size : 512
num_workers : 2
cache : false

train:
data:
data :
name : 'EPIC-KITCHENS-100'
split : 'train'

feat_file : './data/feature/EK100_FLOW_TSN/'
feature : 'TSN'
feature_fps : 30
feature_dim : 1024

forward_frame : 8
past_frame : 40

fps : 4
batch_size : 512
num_workers : 2
cache : false

optimizer:
name: 'AdamW'
base_lr : 1e-4
betas : (0.9, 0.999)
weight_decay : 1e-5

max_epoch : 50
scheduler:
name : 'WarmupCos'
warmup_epoch : 5
step : [50]

model:
name : 'orderNet'

feat_dim: 1024
past_frame : 40
anticipation_frame : 4
action_frame : 4

reasoner:
name : 'transformer'
d_model : 1024
nhead : 16
dff : 4096
depth : 6
dropout : 0.1
pe_type : 'no'

loss:
sigma: 5.
Loading

0 comments on commit 93a7ad0

Please sign in to comment.