-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit f646673
Showing
53 changed files
with
17,736 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
*.pyc |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
[submodule "pytorch-i3d"] | ||
path = pytorch-i3d | ||
url = https://github.com/piergiaj/pytorch-i3d |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
# Learning Latent Super-Events to Detect Multiple Activities in Videos | ||
|
||
This repository contains the code for our [ICML 2019 paper](https://arxiv.org/abs/1803.06316): | ||
|
||
AJ Piergiovanni and Michael S. Ryoo | ||
"Temporal Gaussian Mixture Layer for Videos" | ||
in ICML 2019 | ||
|
||
If you find the code useful for your research, please cite our paper: | ||
|
||
@inproceedings{piergiovanni2018super, | ||
title={Temporal Gaussian Mixture Layer for Videos}, | ||
booktitle={International Conference on Machine Learning (ICML)}, | ||
author={AJ Piergiovanni and Michael S. Ryoo}, | ||
year={2019} | ||
} | ||
|
||
|
||
# Temporal Gaussian Mixture Layer | ||
The core of our approach, the Temporal Gaussian Mixture (TGM) Layer can be found in [tgm.py](tgm.py). | ||
|
||
![mg](/examples/mixture-of-gaussians.png?raw=true "mg") | ||
Multiple (M) temporal Gaussian distributions are learned, and they are combined with the learned soft attention weights to form the C temporal convolution filters. L is the temporal length of the filter. | ||
|
||
![share](/examples/tgm-shared.png?raw=true "share") | ||
The kernels are applied to each input channel, C<sub>in</sub>, and a 1x1 convolution is applied to combine the C<sub>in</sub> input channels for each output channel, C<sub>out</sub>. | ||
|
||
# Activity Detection Experiments | ||
![model overview](/examples/model.png?raw=true "model overview") | ||
|
||
To run our pre-trained models: | ||
|
||
```python train_model.py -mode joint -dataset multithumos -train False -rgb_model_file models/multithumos/rgb_baseline -flow_model_file models/multithumos/flow_baseline``` | ||
|
||
We tested our models on the [MultiTHUMOS](http://ai.stanford.edu/~syyeung/everymoment.html), [Charades](http://allenai.org/plato/charades/), and [MLB-YouTube](https://github.com/piergiaj/mlb-youtube) datasets. We provide our trained models in the models directory. | ||
|
||
## Results | ||
### Charades | ||
|
||
| Method | mAP (%) | | ||
| ------------- | ------------- | | ||
| Two-Stream + LSTM [1] | 9.6 | | ||
| Sigurdsson et al. [1] | 12.1 | | ||
| I3D [2] baseline | 17.22 | | ||
| I3D + 3 temporal conv. | 17.5 | | ||
| I3D + LSTM | 18.1 | | ||
| I3D + Fixed temporal pyramid | 18.2| | ||
| I3D + Super-events [4] | 19.41 | | ||
| I3D + 3 TGMs | 20.6 | | ||
| I3D + Super-events [4] + 3 TGMs | **21.8** | | ||
|
||
### MultiTHUMOS | ||
|
||
| Method | mAP (%) | | ||
| ------------- | ------------- | | ||
| Two-Stream [3] | 27.6 | | ||
| Two-Stream + LSTM [3] | 28.1 | | ||
| Multi-LSTM [3] | 29.6 | | ||
| I3D [2] baseline | 29.7 | | ||
| I3D + LSTM | 29.9 | | ||
| I3D + 3 temporal conv. | 24.4 | | ||
| I3D + Fixed Temporal Pyramid | 31.2 | | ||
| I3D + Super-events [4] | 36.4 | | ||
| I3D + 3 TGMs | 44.3 | | ||
| I3D + Super-events [4] + 3 TGMs | **46.4** | | ||
|
||
|
||
### MLB-YouTube | ||
|
||
| Method | mAP (%) | | ||
| ------------- | ------------- | | ||
| I3D [2] baseline | 34.2 | | ||
| I3D + LSTM | 39.4 | | ||
| I3D + Super-events [4] | 39.1 | | ||
| I3D + 3 TGMs | 40.1 | | ||
| I3D + Super-events [4] + 3 TGMs | **47.1** | | ||
|
||
|
||
# Example Results | ||
![ex](/examples/res.png?raw=true "mg") | ||
The temporal regions classified as various basketball activities from a basketball game video in MultiTHUMOS. Our TMG layers greatly improve performance. | ||
|
||
![gif](/examples/charades_example.gif?raw=true "example") | ||
|
||
|
||
# Requirements | ||
|
||
Our code has been tested on Ubuntu 14.04 and 16.04 using python 2.7, [PyTorch](pytorch.org) version 0.3.1 with a Titan X GPU. | ||
|
||
|
||
# Setup | ||
|
||
1. Download the code ```git clone https://github.com/piergiaj/tgm-icml19.git``` | ||
|
||
2. Extract features from your dataset. See [Pytorch-I3D](https://github.com/piergiaj/pytorch-i3d) for our code to extract I3D features. | ||
|
||
3. [train_model.py](train_model.py) contains the code to train and evaluate models. | ||
|
||
|
||
# Refrences | ||
[1] G. A. Sigurdsson, S. Divvala, A. Farhadi, and A. Gupta. Asynchronous temporal fields for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017 | ||
|
||
[2] J. Carreira and A. Zisserman. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. | ||
|
||
[3] S. Yeung, O. Russakovsky, N. Jin, M. Andriluka, G. Mori, and L. Fei-Fei. Every moment counts: Dense detailed labeling of actions in complex videos. International Journal of Computer Vision (IJCV), pages 1–15, 2015 | ||
|
||
[4] A. Piergiovanni and M. S. Ryoo. Learning latent super-events to detect multiple activities in videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018 [arxiv](https://arxiv.org/abs/1712.01938) [code](https://github.com/piergiaj/super-events-cvpr18) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,136 @@ | ||
import math | ||
import meter | ||
import numpy as np | ||
import torch | ||
|
||
|
||
class APMeter(meter.Meter): | ||
""" | ||
The APMeter measures the average precision per class. | ||
The APMeter is designed to operate on `NxK` Tensors `output` and | ||
`target`, and optionally a `Nx1` Tensor weight where (1) the `output` | ||
contains model output scores for `N` examples and `K` classes that ought to | ||
be higher when the model is more convinced that the example should be | ||
positively labeled, and smaller when the model believes the example should | ||
be negatively labeled (for instance, the output of a sigmoid function); (2) | ||
the `target` contains only values 0 (for negative examples) and 1 | ||
(for positive examples); and (3) the `weight` ( > 0) represents weight for | ||
each sample. | ||
""" | ||
def __init__(self): | ||
super(APMeter, self).__init__() | ||
self.reset() | ||
|
||
def reset(self): | ||
"""Resets the meter with empty member variables""" | ||
self.scores = torch.FloatTensor(torch.FloatStorage()) | ||
self.targets = torch.LongTensor(torch.LongStorage()) | ||
self.weights = torch.FloatTensor(torch.FloatStorage()) | ||
|
||
def add(self, output, target, weight=None): | ||
""" | ||
Args: | ||
output (Tensor): NxK tensor that for each of the N examples | ||
indicates the probability of the example belonging to each of | ||
the K classes, according to the model. The probabilities should | ||
sum to one over all classes | ||
target (Tensor): binary NxK tensort that encodes which of the K | ||
classes are associated with the N-th input | ||
(eg: a row [0, 1, 0, 1] indicates that the example is | ||
associated with classes 2 and 4) | ||
weight (optional, Tensor): Nx1 tensor representing the weight for | ||
each example (each weight > 0) | ||
""" | ||
if not torch.is_tensor(output): | ||
output = torch.from_numpy(output) | ||
if not torch.is_tensor(target): | ||
target = torch.from_numpy(target) | ||
|
||
if weight is not None: | ||
if not torch.is_tensor(weight): | ||
weight = torch.from_numpy(weight) | ||
weight = weight.squeeze() | ||
if output.dim() == 1: | ||
output = output.view(-1, 1) | ||
else: | ||
assert output.dim() == 2, \ | ||
'wrong output size (should be 1D or 2D with one column \ | ||
per class)' | ||
if target.dim() == 1: | ||
target = target.view(-1, 1) | ||
else: | ||
assert target.dim() == 2, \ | ||
'wrong target size (should be 1D or 2D with one column \ | ||
per class)' | ||
if weight is not None: | ||
assert weight.dim() == 1, 'Weight dimension should be 1' | ||
assert weight.numel() == target.size(0), \ | ||
'Weight dimension 1 should be the same as that of target' | ||
assert torch.min(weight) >= 0, 'Weight should be non-negative only' | ||
assert torch.equal(target**2, target), \ | ||
'targets should be binary (0 or 1)' | ||
if self.scores.numel() > 0: | ||
assert target.size(1) == self.targets.size(1), \ | ||
'dimensions for output should match previously added examples.' | ||
|
||
# make sure storage is of sufficient size | ||
if self.scores.storage().size() < self.scores.numel() + output.numel(): | ||
new_size = math.ceil(self.scores.storage().size() * 1.5) | ||
new_weight_size = math.ceil(self.weights.storage().size() * 1.5) | ||
self.scores.storage().resize_(int(new_size + output.numel())) | ||
self.targets.storage().resize_(int(new_size + output.numel())) | ||
if weight is not None: | ||
self.weights.storage().resize_(int(new_weight_size | ||
+ output.size(0))) | ||
|
||
# store scores and targets | ||
offset = self.scores.size(0) if self.scores.dim() > 0 else 0 | ||
self.scores.resize_(offset + output.size(0), output.size(1)) | ||
self.targets.resize_(offset + target.size(0), target.size(1)) | ||
self.scores.narrow(0, offset, output.size(0)).copy_(output) | ||
self.targets.narrow(0, offset, target.size(0)).copy_(target) | ||
|
||
if weight is not None: | ||
self.weights.resize_(offset + weight.size(0)) | ||
self.weights.narrow(0, offset, weight.size(0)).copy_(weight) | ||
|
||
def value(self): | ||
"""Returns the model's average precision for each class | ||
Return: | ||
ap (FloatTensor): 1xK tensor, with avg precision for each class k | ||
""" | ||
|
||
if self.scores.numel() == 0: | ||
return 0 | ||
ap = torch.zeros(self.scores.size(1)) | ||
rg = torch.range(1, self.scores.size(0)).float() | ||
if self.weights.numel() > 0: | ||
weight = self.weights.new(self.weights.size()) | ||
weighted_truth = self.weights.new(self.weights.size()) | ||
|
||
# compute average precision for each class | ||
for k in range(self.scores.size(1)): | ||
# sort scores | ||
scores = self.scores[:, k] | ||
targets = self.targets[:, k] | ||
_, sortind = torch.sort(scores, 0, True) | ||
truth = targets[sortind] | ||
if self.weights.numel() > 0: | ||
weight = self.weights[sortind] | ||
weighted_truth = truth.float() * weight | ||
rg = weight.cumsum(0) | ||
|
||
# compute true positive sums | ||
if self.weights.numel() > 0: | ||
tp = weighted_truth.cumsum(0) | ||
else: | ||
tp = truth.float().cumsum(0) | ||
|
||
# compute precision curve | ||
precision = tp.div(rg) | ||
|
||
# compute average precision | ||
ap[k] = precision[truth.byte()].sum() / max(truth.sum(), 1) | ||
return ap |
Oops, something went wrong.