init

AllenXuuu · Apr 11, 2022 · 93a7ad0 · 93a7ad0
1 parent cc3ad1a
commit 93a7ad0
Show file tree

Hide file tree

Showing 164 changed files with 615,473 additions and 3 deletions.
diff --git a/.gitignore b/.gitignore
@@ -7,10 +7,9 @@ __pycache__
 .pth
 
 # Data directories
-data/feature
 exp
 weight
 checkpoint
 
 # python notebook related
-*.ipynb_checkpoints
+*.ipynb_checkpoints
diff --git a/README.md b/README.md
@@ -7,12 +7,71 @@ This repo contains the official implementation of paper
 > [Xinyu Xu](https://xuxinyu.website), [Yong-Lu Li](https://dirtyharrylyl.github.io/), [Cewu Lu](https://mvig.sjtu.edu.cn).
 >
 > In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022.
+> [[arxiv](https://arxiv.org/abs/2204.02587)] [[code](https://github.com/AllenXuuu/DCR)] [[model](https://drive.google.com/drive/folders/1bXFs1_9HBPi74LpsYfxx753Vkc6BbEHa?usp=sharing)]
+****
 
-## Code and models are coming soon!
+## Data Prepare
 
+We reorganize the annotation files of four datasets[1-4] in ```data``` folder.   
+You need to download pre-extracted feature into ```data/feature``` folder.    
+TSN feature can be downloaded from [Link](https://github.com/fpv-iplab/rulstm) [5].   
+irCSN-152 feature can be downloaded from [Link](https://github.com/facebookresearch/AVT) [6].    
+We provide a stronger TSM backbone, feature can be be downloaded from [Link](https://drive.google.com/drive/folders/1spwT8r7Fcm1fJJFju_L7NdyyHKckODNo?usp=sharing).
+
+
+## Packages
+
+We conduct experiments in the following environment
+```
+python == 3.9
+
+torch == 1.9
+torchvision == 0.10.0
+apex
+tensorboardX
+yacs
+pyyaml
+numpy
+prefetch_generator
+```
+
+
+## Evaluation
+
+We release pre-trained models at [here](https://drive.google.com/drive/folders/1bXFs1_9HBPi74LpsYfxx753Vkc6BbEHa?usp=sharing).  
+To test the performance of our model with RGB-TSM backbone on EPIC-KITCHENS-100[1], you can run the following command.
+
+```
+python eval.py --cfg configs/EK100RGBTSM/eval.yaml --resume ./weights/EK100RGBTSM.pt
+```
+
+Here ```./weights/EK100RGBTSM.pt``` is the path to the pre-trained model you downloaded.
+
+Results can be found in [Model Zoo](./docs/model_zoo.md).
+
+## Training 
+
+Taking the same setting as an example, to reproduce the training process, you can run
+
+```
+python train_order.py --cfg configs/EK100RGBTSM/order.yaml --name order
+
+python train.py --cfg configs/EK100RGBTSM/train.yaml --name train --resume exp/EK100RGBTSM/order/epoch_50.pt
+```
+
+The first line runs our frame order pre-training stage. The model will be stored in ```exp/EK100RGBTSM/order/epoch_50.pt```. The second line reloads the pre-trained model and runs the anticipation training stage.
+
+Only to do the anticipation training from scratch is also possible by
+
+```
+python train.py --cfg configs/EK100RGBTSM/train.yaml --name train 
+```
 
 
 ## Citation
+
+If you find our paper or code helpful, please cite our paper.
+
 ```
 @inproceedings{xu2022learning,
   title={Learning to Anticipate Future with Dynamic Context Removal },
@@ -21,3 +80,21 @@ This repo contains the official implementation of paper
   year={2022}
 }
 ```
+
+
+## Reference
+
+**[1]** Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Jian Ma, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, and Michael Wray. Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100. International Journal of Computer Vision (IJCV), 2021.
+
+**[2]** Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, et al. Scaling egocentric vision: The epic-kitchens dataset. In Proceedings of the European Conference on Computer Vision (ECCV), pages 720–736, 2018.
+
+**[3]** Yin Li, Miao Liu, and James M. Rehg. In the eye of beholder: Joint learning of gaze and actions in first person video. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
+
+
+**[4]** Sebastian Stein and Stephen J McKenna. Combining embedded accelerometers with computer vision for recognizing food preparation activities. In Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing, pages 729–738, 2013.
+
+**[5]** Antonino Furnari and Giovanni Farinella. Rolling-unrolling lstms for action anticipation from first-person video. IEEE transactions on pattern analysis and machine intelligence, 2020.
+
+
+**[6]** Rohit Girdhar and Kristen Grauman. Anticipative Video Transformer. In ICCV, 2021.
+
diff --git a/config.py b/config.py
@@ -0,0 +1,154 @@
+import copy
+import os
+from yacs.config import CfgNode as CN
+import argparse
+import time
+
+root = CN()
+
+# root.name = '_'.join(time.asctime(time.localtime(time.time())).split())
+root.name = None
+root.local_rank = None
+root.seed = 0
+root.AMP_OPT_LEVEL = 'O1'
+root.save_freq = None
+root.main_metric = None
+
+
+root.resume = CN()
+root.resume.path = None
+root.resume.pe_from_cls = True
+root.resume.type = None
+
+########################################## Curriculum
+root.curriculum = CN()
+root.curriculum.gamma_min = 0.9
+root.curriculum.gamma_max = 0.98
+
+
+########################################## TRAIN: DATA
+root.train = CN()
+root.train.report_iter = None
+root.train.data = CN()
+
+root.train.data.name = ""
+root.train.data.split = ""
+root.train.data.cache = False
+root.train.data.drop = False
+root.train.data.forward_frame = 0
+root.train.data.past_frame = 0
+root.train.data.fps = 4
+root.train.data.tau_a = 1.0
+root.train.data.batch_size = 8
+root.train.data.num_workers = 8
+root.train.data.weight = True
+
+
+root.train.data.feat_file = ""
+root.train.data.feature = ""
+root.train.data.feature_fps = None
+root.train.data.feature_dim = None
+
+########################################## TRAIN: LEARNING 
+root.train.max_epoch = 100
+root.train.clip_grad = 5.0
+
+root.train.optimizer = CN()
+root.train.optimizer.name = 'AdamW'
+root.train.optimizer.base_lr = 1e-4
+root.train.optimizer.momentum = 0.9
+root.train.optimizer.betas = (0.9, 0.999)
+root.train.optimizer.weight_decay = 1e-2
+root.train.optimizer.eps = 1e-8
+
+root.train.scheduler = CN()
+root.train.scheduler.name = 'no'
+root.train.scheduler.step = []
+root.train.scheduler.gamma = 0.1
+root.train.scheduler.eta_min = 0.
+root.train.scheduler.warmup_epoch = 0
+
+########################################## EVAL: DATA
+root.eval = CN()
+root.eval.freq = 100
+root.eval.report_iter = None
+
+root.eval.data = CN()
+root.eval.data.name = ""
+root.eval.data.split = ""
+root.eval.data.cache = False
+root.eval.data.drop = False
+root.eval.data.forward_frame = 0
+root.eval.data.past_frame = 0
+root.eval.data.fps = 4
+root.eval.data.tau_a = 1.0
+root.eval.data.batch_size = 8
+root.eval.data.num_workers = 8
+root.eval.data.weight = False
+
+root.eval.data.feat_file = ""
+root.eval.data.feature = ""
+root.eval.data.feature_fps = None
+root.eval.data.feature_dim = None
+
+########################################## MODEL: ARCH
+root.model = CN()
+root.model.name = None
+
+root.model.feat_dim = 0
+root.model.past_frame = 0
+root.model.anticipation_frame = 4
+root.model.action_frame = 4
+
+root.model.reasoner = CN()
+root.model.reasoner.name = None
+root.model.reasoner.d_model = 512
+root.model.reasoner.nhead = 1
+root.model.reasoner.dff = 2048
+root.model.reasoner.depth = 1
+root.model.reasoner.dropout = 0.1
+root.model.reasoner.pe_type = 'learnable'
+
+root.model.classifier = CN()
+root.model.classifier.action = False
+root.model.classifier.verb = False
+root.model.classifier.noun = False
+root.model.classifier.hidden = []
+root.model.classifier.dropout = 0.
+
+
+########################################## MODEL: LOSS
+root.model.loss = CN()
+root.model.loss.name = 'CE'
+root.model.loss.smooth = 0.
+root.model.loss.sigma = None
+
+## weight
+root.model.loss.verb = 1.
+root.model.loss.noun = 1.
+root.model.loss.action = 1.
+root.model.loss.next_cls = 0.
+root.model.loss.feat_mse = 0.
+
+
+
+def load_config(args= None):
+    config = copy.deepcopy(root)
+
+    if args is None:
+        return config
+
+    config.merge_from_file(args.cfg)
+
+    config.local_rank = args.local_rank
+    config.resume.type = args.weight_type
+    config.resume.path = args.resume
+
+    if args.name is not None:
+        config.name =  args.name
+
+    _,folder,_ = args.cfg.split('/')
+    config.folder = os.path.join('exp',folder)
+
+
+    return config
diff --git a/configs/EK100FLOWTSN/eval.yaml b/configs/EK100FLOWTSN/eval.yaml
@@ -0,0 +1,52 @@
+main_metric : 'All_A'
+
+eval:
+  freq : 1
+  data :
+    name : 'EPIC-KITCHENS-100'
+    split : 'valid'
+
+    feat_file : './data/feature/EK100_FLOW_TSN/'
+    feature : 'TSN'
+    feature_fps : 30
+    feature_dim : 1024
+
+    forward_frame : 8
+    past_frame : 40
+
+    fps : 4
+    batch_size : 128
+    num_workers : 2
+    cache : false
+
+
+model:
+  name : 'DCR'
+
+  feat_dim: 1024
+  past_frame : 40
+  anticipation_frame : 4
+  action_frame : 4
+
+  reasoner:
+    name : 'transformer'
+    d_model : 1024
+    nhead : 16
+    dff : 4096
+    depth : 6
+    dropout : 0.1
+    pe_type : 'learnable'
+
+  classifier:
+    dropout: 0.4
+    action: true
+    verb: true
+    noun : true
+
+  loss:
+    name : 'CE'
+    verb : 0.5
+    noun : 0.5
+    feat_mse : 1.
+    next_cls : 1.
+    smooth: 0.1
diff --git a/configs/EK100FLOWTSN/order.yaml b/configs/EK100FLOWTSN/order.yaml
@@ -0,0 +1,71 @@
+save_freq : 50
+
+eval:
+  freq : 5
+  data :
+    name : 'EPIC-KITCHENS-100'
+    split : 'valid'
+
+    feat_file : './data/feature/EK100_FLOW_TSN/'
+    feature : 'TSN'
+    feature_fps : 30
+    feature_dim : 1024
+
+    forward_frame : 8
+    past_frame : 40
+
+    fps : 4
+    batch_size : 512
+    num_workers : 2
+    cache : false
+
+train:
+  data:
+  data :
+    name : 'EPIC-KITCHENS-100'
+    split : 'train'
+
+    feat_file : './data/feature/EK100_FLOW_TSN/'
+    feature : 'TSN'
+    feature_fps : 30
+    feature_dim : 1024
+
+    forward_frame : 8
+    past_frame : 40
+
+    fps : 4
+    batch_size : 512
+    num_workers : 2
+    cache : false
+
+  optimizer:
+    name: 'AdamW'
+    base_lr : 1e-4
+    betas : (0.9, 0.999)
+    weight_decay : 1e-5
+
+  max_epoch : 50
+  scheduler:
+    name : 'WarmupCos'
+    warmup_epoch : 5
+    step : [50]
+
+model:
+  name : 'orderNet'
+
+  feat_dim: 1024
+  past_frame : 40
+  anticipation_frame : 4
+  action_frame : 4
+
+  reasoner:
+    name : 'transformer'
+    d_model : 1024
+    nhead : 16
+    dff : 4096
+    depth : 6
+    dropout : 0.1
+    pe_type : 'no'
+
+  loss:
+    sigma: 5.