initial commit

piergiaj · Apr 30, 2019 · f646673 · f646673
commit f646673
Show file tree

Hide file tree

Showing 53 changed files with 17,736 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1 @@
+*.pyc
diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "pytorch-i3d"]
+	path = pytorch-i3d
+	url = https://github.com/piergiaj/pytorch-i3d
diff --git a/README.md b/README.md
@@ -0,0 +1,107 @@
+# Learning Latent Super-Events to Detect Multiple Activities in Videos
+
+This repository contains the code for our [ICML 2019 paper](https://arxiv.org/abs/1803.06316):
+
+    AJ Piergiovanni and Michael S. Ryoo
+    "Temporal Gaussian Mixture Layer for Videos"
+    in ICML 2019
+
+If you find the code useful for your research, please cite our paper:
+
+        @inproceedings{piergiovanni2018super,
+              title={Temporal Gaussian Mixture Layer for Videos},
+              booktitle={International Conference on Machine Learning (ICML)},
+              author={AJ Piergiovanni and Michael S. Ryoo},
+              year={2019}
+        }
+
+
+# Temporal Gaussian Mixture Layer
+The core of our approach, the Temporal Gaussian Mixture (TGM) Layer can be found in [tgm.py](tgm.py).
+
+![mg](/examples/mixture-of-gaussians.png?raw=true "mg")
+Multiple (M) temporal Gaussian distributions are learned, and they are combined with the learned soft attention weights to form the C temporal convolution filters. L is the temporal length of the filter.
+
+![share](/examples/tgm-shared.png?raw=true "share")
+The kernels are applied to each input channel, C<sub>in</sub>, and a 1x1 convolution is applied to combine the C<sub>in</sub> input channels for each output channel, C<sub>out</sub>.
+
+# Activity Detection Experiments
+![model overview](/examples/model.png?raw=true "model overview")
+
+To run our pre-trained models:
+
+```python train_model.py -mode joint -dataset multithumos -train False -rgb_model_file models/multithumos/rgb_baseline -flow_model_file models/multithumos/flow_baseline```
+
+We tested our models on the [MultiTHUMOS](http://ai.stanford.edu/~syyeung/everymoment.html), [Charades](http://allenai.org/plato/charades/), and [MLB-YouTube](https://github.com/piergiaj/mlb-youtube) datasets. We provide our trained models in the models directory.
+
+## Results
+### Charades
+
+|  Method | mAP (%) |
+| ------------- | ------------- |
+| Two-Stream + LSTM [1] | 9.6  |
+| Sigurdsson et al. [1]  | 12.1  |
+| I3D [2] baseline      | 17.22 |
+| I3D + 3 temporal conv. | 17.5 |
+| I3D + LSTM          | 18.1  |
+| I3D + Fixed temporal pyramid | 18.2|
+| I3D + Super-events [4] | 19.41 |
+| I3D + 3 TGMs  | 20.6 |
+| I3D + Super-events [4] + 3 TGMs | **21.8** |
+
+### MultiTHUMOS
+
+|  Method | mAP (%) |
+| ------------- | ------------- |
+| Two-Stream [3]  | 27.6  |
+| Two-Stream + LSTM [3] | 28.1 | 
+| Multi-LSTM [3]  | 29.6  |
+| I3D [2] baseline | 29.7 |
+| I3D + LSTM | 29.9 |
+| I3D + 3 temporal conv. | 24.4 |
+| I3D + Fixed Temporal Pyramid | 31.2 |
+| I3D + Super-events [4] | 36.4 |
+| I3D + 3 TGMs | 44.3 |
+| I3D + Super-events [4] + 3 TGMs | **46.4** |
+
+
+### MLB-YouTube
+
+|  Method | mAP (%) |
+| ------------- | ------------- |
+| I3D [2] baseline | 34.2 |
+| I3D + LSTM | 39.4 |
+| I3D + Super-events [4] | 39.1 |
+| I3D + 3 TGMs | 40.1 |
+| I3D + Super-events [4] + 3 TGMs | **47.1** |
+
+
+# Example Results
+![ex](/examples/res.png?raw=true "mg")
+The temporal regions classified as various basketball activities from a basketball game video in MultiTHUMOS. Our TMG layers greatly improve performance.
+
+![gif](/examples/charades_example.gif?raw=true "example")
+
+
+# Requirements
+
+Our code has been tested on Ubuntu 14.04 and 16.04 using python 2.7, [PyTorch](pytorch.org) version 0.3.1 with a Titan X GPU.
+
+
+# Setup
+
+1. Download the code ```git clone https://github.com/piergiaj/tgm-icml19.git```
+
+2. Extract features from your dataset. See [Pytorch-I3D](https://github.com/piergiaj/pytorch-i3d) for our code to extract I3D features.
+
+3. [train_model.py](train_model.py) contains the code to train and evaluate models.
+
+
+# Refrences
+[1] G.  A.  Sigurdsson,  S.  Divvala,  A.  Farhadi,  and  A.  Gupta. Asynchronous temporal fields for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
+
+[2] J. Carreira and A. Zisserman. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
+
+[3] S. Yeung, O. Russakovsky, N. Jin, M. Andriluka, G. Mori, and L. Fei-Fei. Every moment counts: Dense detailed labeling of actions in complex videos. International Journal of Computer Vision (IJCV), pages 1–15, 2015
+
+[4] A. Piergiovanni  and  M.  S.  Ryoo.  Learning  latent  super-events  to  detect  multiple  activities  in  videos.   In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018 [arxiv](https://arxiv.org/abs/1712.01938) [code](https://github.com/piergiaj/super-events-cvpr18)
diff --git a/apmeter.py b/apmeter.py
@@ -0,0 +1,136 @@
+import math
+import meter
+import numpy as np
+import torch
+
+
+class APMeter(meter.Meter):
+    """
+    The APMeter measures the average precision per class.
+
+    The APMeter is designed to operate on `NxK` Tensors `output` and
+    `target`, and optionally a `Nx1` Tensor weight where (1) the `output`
+    contains model output scores for `N` examples and `K` classes that ought to
+    be higher when the model is more convinced that the example should be
+    positively labeled, and smaller when the model believes the example should
+    be negatively labeled (for instance, the output of a sigmoid function); (2)
+    the `target` contains only values 0 (for negative examples) and 1
+    (for positive examples); and (3) the `weight` ( > 0) represents weight for
+    each sample.
+    """
+    def __init__(self):
+        super(APMeter, self).__init__()
+        self.reset()
+
+    def reset(self):
+        """Resets the meter with empty member variables"""
+        self.scores = torch.FloatTensor(torch.FloatStorage())
+        self.targets = torch.LongTensor(torch.LongStorage())
+        self.weights = torch.FloatTensor(torch.FloatStorage())
+
+    def add(self, output, target, weight=None):
+        """
+        Args:
+            output (Tensor): NxK tensor that for each of the N examples
+                indicates the probability of the example belonging to each of
+                the K classes, according to the model. The probabilities should
+                sum to one over all classes
+            target (Tensor): binary NxK tensort that encodes which of the K
+                classes are associated with the N-th input
+                    (eg: a row [0, 1, 0, 1] indicates that the example is
+                         associated with classes 2 and 4)
+            weight (optional, Tensor): Nx1 tensor representing the weight for
+                each example (each weight > 0)
+        """
+        if not torch.is_tensor(output):
+            output = torch.from_numpy(output)
+        if not torch.is_tensor(target):
+            target = torch.from_numpy(target)
+
+        if weight is not None:
+            if not torch.is_tensor(weight):
+                weight = torch.from_numpy(weight)
+            weight = weight.squeeze()
+        if output.dim() == 1:
+            output = output.view(-1, 1)
+        else:
+            assert output.dim() == 2, \
+                'wrong output size (should be 1D or 2D with one column \
+                per class)'
+        if target.dim() == 1:
+            target = target.view(-1, 1)
+        else:
+            assert target.dim() == 2, \
+                'wrong target size (should be 1D or 2D with one column \
+                per class)'
+        if weight is not None:
+            assert weight.dim() == 1, 'Weight dimension should be 1'
+            assert weight.numel() == target.size(0), \
+                'Weight dimension 1 should be the same as that of target'
+            assert torch.min(weight) >= 0, 'Weight should be non-negative only'
+        assert torch.equal(target**2, target), \
+            'targets should be binary (0 or 1)'
+        if self.scores.numel() > 0:
+            assert target.size(1) == self.targets.size(1), \
+                'dimensions for output should match previously added examples.'
+
+        # make sure storage is of sufficient size
+        if self.scores.storage().size() < self.scores.numel() + output.numel():
+            new_size = math.ceil(self.scores.storage().size() * 1.5)
+            new_weight_size = math.ceil(self.weights.storage().size() * 1.5)
+            self.scores.storage().resize_(int(new_size + output.numel()))
+            self.targets.storage().resize_(int(new_size + output.numel()))
+            if weight is not None:
+                self.weights.storage().resize_(int(new_weight_size
+                                               + output.size(0)))
+
+        # store scores and targets
+        offset = self.scores.size(0) if self.scores.dim() > 0 else 0
+        self.scores.resize_(offset + output.size(0), output.size(1))
+        self.targets.resize_(offset + target.size(0), target.size(1))
+        self.scores.narrow(0, offset, output.size(0)).copy_(output)
+        self.targets.narrow(0, offset, target.size(0)).copy_(target)
+
+        if weight is not None:
+            self.weights.resize_(offset + weight.size(0))
+            self.weights.narrow(0, offset, weight.size(0)).copy_(weight)
+
+    def value(self):
+        """Returns the model's average precision for each class
+
+        Return:
+            ap (FloatTensor): 1xK tensor, with avg precision for each class k
+        """
+
+        if self.scores.numel() == 0:
+            return 0
+        ap = torch.zeros(self.scores.size(1))
+        rg = torch.range(1, self.scores.size(0)).float()
+        if self.weights.numel() > 0:
+            weight = self.weights.new(self.weights.size())
+            weighted_truth = self.weights.new(self.weights.size())
+
+        # compute average precision for each class
+        for k in range(self.scores.size(1)):
+            # sort scores
+            scores = self.scores[:, k]
+            targets = self.targets[:, k]
+            _, sortind = torch.sort(scores, 0, True)
+            truth = targets[sortind]
+            if self.weights.numel() > 0:
+                weight = self.weights[sortind]
+                weighted_truth = truth.float() * weight
+                rg = weight.cumsum(0)
+
+            # compute true positive sums
+            if self.weights.numel() > 0:
+                tp = weighted_truth.cumsum(0)
+            else:
+                tp = truth.float().cumsum(0)
+
+            # compute precision curve
+            precision = tp.div(rg)
+
+            # compute average precision
+            ap[k] = precision[truth.byte()].sum() / max(truth.sum(), 1)
+        return ap