First commit

datamllab · Jul 12, 2021 · b770887 · b770887
1 parent cff2168
commit b770887
Show file tree

Hide file tree

Showing 70 changed files with 12,090 additions and 1 deletion.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,115 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+tests/.asv
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+docs/d3m.rst
+docs/d3m.*.rst
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mypy
+.mypy_cache/
+
+# site
+public/
+
+.idea/
+tmp/
+
+*.swp
+*.swo
+.DS_Store
+pipeline.json
+log.txt
+fitted_pipeline
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,32 @@
+# Contibuting Guide
+Contribution to this project is greatly appreciated! If you find any bugs or have any feedback, please create an issue or send a pull request to fix the bug. If you want to contribute codes for new features, please follow this guide. We currently have several plans. Please create an issue or contact us through emails if you have other suggestions.
+
+## Roadmaps
+*   **Skeleton-based action recogonition.** We plan to include more action recogonition algorithms such as skeleton-based methods
+*   **Object dectection.** We plan to inlcude obeject dectection primitives
+
+## How to Create a Pull Request
+
+If this your first time to contribute to a project, kindly follow the following instructions. You may find [Creating a pull request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request) helpful. Mainly, you need to take the following steps to send a pull request:
+
+*   Click **Fork** in the upper-right corner of the project main page to create a new branch in your local Github.
+*   Clone the repo from your local repo in your Github.
+*   Make changes in your computer.
+*   Commit and push your local changes to your local repo.
+*   Send a pull request to merge your local branch to the branches in RLCard project.
+
+## How to Create a New Primitive
+
+AutoVideo uses general pipeline languages. Each module is wrapped as a primitive. Since most of the tasks are supervised, we have provided a supervised base primitive in [autovideo/base/supervised_base.py](autovideo/base/supervised_base.py). Of course, we can also include unsupervised primitives, which are supported in D3M. Please don't hesitate to contact us if you need a guide. To create a supervised primitive, you generally need to follow the steps below:
+
+*   Create a new file and inherit the classes in [autovideo/base/supervised_base.py](autovideo/base/supervised_base.py).
+*   Create hyperparameters in `SupervisedHyperparamsBase`.
+*   Put the important variables that you want them to be saved and reused into `SupervisedParamsBase`. Be default, we assume you will save the model class.
+*   Implement the functions in `SupervisedPrimitiveBase`, such as `fit`, `produce`, etc.
+*   Register the primitive in [entry_points.ini](entry_points.ini).
+*   Reinstall Autovideo with `pip3 install -e .`.
+*   Run `python3 -m d3m index search` and make sure that your primitive is in the list.
+*   Then you can use your primitive! You can play with the [pipeline](autovideo/utils/d3m_utils.py#L61) and add your primitive to the pipeline.
+
+Note that there are many more details that are not mentioned in the above steps. You may refer to the example of [TSN](autovideo/recognition/tsn_primitive.py) to learn more.
+
diff --git a/LICENSE.md b/LICENSE.md
@@ -0,0 +1,7 @@
+Copyright (c) 2021 DATA Lab at Texas A&M University
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
diff --git a/README.md b/README.md
@@ -1 +1,139 @@
-# autovideo
+# AutoVideo: An Automated Video Action Recognition System
+<img width="500" src="docs/autovideo_logo.png" alt="Logo" />
+
+AutoVideo is a system for automated video analysis. It is developed based on [D3M](https://gitlab.com/datadrivendiscovery/d3m) infrastructure, which describes machine learning with generic pipeline languages. Currently, it focuses on video action recognition, supporting various state-of-the-art video action recognition algorithms. It also supports automated model selection and hyperparameter tuning. AutoVideo is developed by [DATA Lab](http://faculty.cs.tamu.edu/xiahu/) at Texas A&M University.
+
+There are some other video analysis libraries out there, but this one is designed to be highly modular. AutoVideo is highly extendible thanks to the pipeline language, where each model is wrapped as a primitive with some hyperparameters. This allows us to easily support other algorithms for other video analysis tasks, which will be our future efforts. It is also convenient to search models and hyperparameters with the pipeline language.
+
+<img src="docs/demo.gif" alt="Demo" />
+
+An overview of the library is shown as below. Each module in AutoVideo is wrapped as a primitive with some hyperparameters. A pipeline consists of a series of primitives from pre-processing to action recognition. AutoVideo is equipped with tuners to search models and hyperparameters. We welcome contributions to enrich AutoVideo with more primitives. You can find instructions in [Contributing Guide](./CONTRIBUTING.md).
+
+<img width="500" src="docs/overview.jpg" alt="Overview" />
+
+## Cite this work
+If you find this repo useful, you may cite:
+
+## Installation
+Make sure that you have **Python 3.6** and **pip** installed. First, install `torch` and `torchvision` with
+```
+pip3 install torch
+pip3 install torchvision
+```
+To use the automated searching, you need to install ray-tune with
+```
+pip install 'ray[tune]'
+```
+We recommend installing the stable version of `autovideo` with `pip`:
+```
+pip3 install autovideo
+```
+Alternatively, you can clone the latest version with
+```
+git clone https://github.com/datamllab/autovideo.git
+```
+Then install with
+```
+cd autovideo
+pip3 install -e .
+```
+
+## Toy Examples
+To try the examples, you may download `hmdb6` dataset, which is a subset of `hmdb51` with only 51 classes. All the datasets can be downloaded from [Google Drive](https://drive.google.com/drive/folders/13oVPMyoBgNwEAsE_Ad3XVI1W5cNqfvrq). Then, you may unzip a dataset and put it in [datasets](datasets/).
+### Fitting and saving a pipeline
+```
+python3 examples/fit.py
+```
+Some important hyperparameters are as follows.
+*   `--alg`: the supported algorithm. Currently we support `tsn`, `tsm`, `i3d`, `eco`, `eco_full`, `c3d`, `r2p1d`, and `r3d`.
+*   `--pretrained`: whether loading pre-trained weights and fine-tuning.
+*   `--gpu`: which gpu device tp use. Empty string for CPU. 
+*   `--data_dir`: the directory of the dataset
+*   `--log_dir`: the path for sainge the log
+*   `--save_dir`: the path for saving the fitted pipeline
+
+### Loading a fitted pipeline and producing predictions
+After fitting a pipeline, you can load a pipeline and make predictions.
+```
+python3 examples/produce.py
+```
+Some important hyperparameters are as follows.
+*   `--gpu`: which gpu device tp use. Empty string for CPU. 
+*   `--data_dir`: the directory of the dataset
+*   `--log_dir`: the path for sainge the log
+*   `--load_dir`: the path for loading the fitted pipeline
+
+### Loading a fitted pipeline and recogonizing actions
+After fitting a pipeline, you can also make predicitons on a single video. As a demo, you may download the fitted pipeline and the demo video from [Google Drive](https://drive.google.com/drive/folders/1j4iGUQG3m_TXbQ8mQnaR_teg1w0I2x60). Then, you can use the following command to recogonize the action in the video:
+```
+python3 examples/recogonize.py
+```
+Some important hyperparameters are as follows.
+*   `--gpu`: which gpu device tp use. Empty string for CPU. 
+*   `--video_path`: the path of video file
+*   `--log_dir`: the path for sainge the log
+*   `--load_dir`: the path for loading the fitted pipeline
+
+### Fitting and producing a pipeline
+Alternatively, you can do `fit` and `produce` without saving the model with
+```
+python3 examples/fit_produce.py
+```
+Some important hyperparameters are as follows.
+*   `--alg`: the supported algorithm.
+*   `--pretrained`: whether loading pre-trained weights and fine-tuning.
+*   `--gpu`: which gpu device tp use. Empty string for CPU. 
+*   `--data_dir`: the directory of the dataset
+*   `--log_dir`: the path for sainge the log
+
+### Automated searching
+In addition to running them by yourself, we also support automated model selection and hyperparameter tuning:
+```
+python3 examples/search.py
+```
+Some important hyperparameters are as follows.
+*   `--alg`: the searching  algorithm. Currently, we support `random` and `hyperopt`.
+*   `--num_samples`: the number of samples to be tried
+*   `--gpu`: which gpu device tp use. Empty string for CPU. 
+*   `--data_dir`: the directory of the dataset
+
+## Supported Algorithms
+
+| Algorithms | Primitive Path                                                                             | Paper                                                                                                                    | 
+| :--------: | :----------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------- | 
+| TSN        | [autovideo/recognition/tsn_primitive.py](autovideo/recognition/tsn_primitive.py)           |  [Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859)       |
+| TSM        | [autovideo/recognition/tsm_primitive.py](autovideo/recognition/tsm_primitive.py)           |  [TSM: Temporal Shift Module for Efficient Video Understanding](https://arxiv.org/abs/1811.08383)                        |
+| R2P1D      | [autovideo/recognition/r2p1d_primitive.py](autovideo/recognition/r2p1d_primitive.py)       |  [A Closer Look at Spatiotemporal Convolutions for Action Recognition](https://arxiv.org/abs/1711.11248)                 |
+| R3D        | [autovideo/recognition/r3d_primitive.py](autovideo/recognition/r3d_primitive.py)           |  [Learning spatio-temporal features with 3d residual networks for action recognition](https://arxiv.org/abs/1708.07632)  |
+| C3D        | [autovideo/recognition/c3d_primitive.py](autovideo/recognition/c3d_primitive.py)           |  [Learning Spatiotemporal Features with 3D Convolutional Networks](https://arxiv.org/abs/1412.0767)                      | 
+| ECO-Lite   | [autovideo/recognition/eco_primitive.py](autovideo/recognition/eco_primitive.py)           |  [ECO: Efficient Convolutional Network for Online Video Understanding](https://arxiv.org/abs/1804.09066)                 | 
+| ECO-Full   | [autovideo/recognition/eco_full_primitive.py](autovideo/recognition/eco_full_primitive.py) |  [ECO: Efficient Convolutional Network for Online Video Understanding](https://arxiv.org/abs/1804.09066)                 | 
+| I3D        | [autovideo/recognition/i3d_primitive.py](autovideo/recognition/i3d_primitive.py)           |  [Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset](https://arxiv.org/abs/1705.07750)                 | 
+
+## Advanced Usage
+Beyond the above examples, you can also customize the configurations.
+
+### Configuring the hypereparamters
+Each model in AutoVideo is wrapped as a primitive, which contains some hyperparameters. An example of TSN is [here](autovideo/recognition/tsn_primitive.py#L31-78). All the hyperparameters can be specified when building the pipeline by passing a `config` dictionary. See [examples/fit.py](examples/fit.py#L40-42).
+
+### Configuring the search space
+The tuner will search the best hyperparamter combinations within a search sapce to improve the performance. The search space can be defined with ray-tune. See [examples/search.py](examples/search.py#L42-47).
+
+## Preparing datasets and benchmarking
+The datasets must follow d3m format, which consists of a csv file and a media folder. The csv file should have three columns to specify the instance indices, video file names and labels. An example is as below
+```
+d3mIndex,video,label
+0,Aussie_Brunette_Brushing_Hair_II_brush_hair_u_nm_np1_ri_med_3.avi,0
+1,brush_my_hair_without_wearing_the_glasses_brush_hair_u_nm_np1_fr_goo_2.avi,0
+2,Brushing_my_waist_lenth_hair_brush_hair_u_nm_np1_ba_goo_0.avi,0
+3,brushing_raychel_s_hair_brush_hair_u_cm_np2_ri_goo_2.avi,0
+4,Brushing_Her_Hair__[_NEW_AUDIO_]_UPDATED!!!!_brush_hair_h_cm_np1_le_goo_1.avi,0
+5,Haarek_mmen_brush_hair_h_cm_np1_fr_goo_0.avi,0
+6,Haarek_mmen_brush_hair_h_cm_np1_fr_goo_1.avi,0
+7,Prelinger_HabitPat1954_brush_hair_h_nm_np1_fr_med_26.avi,0
+8,brushing_hair_2_brush_hair_h_nm_np1_ba_med_2.avi,0
+```
+The media folder should contain video files. You may refer to our example hmdb6 dataset in [Google Drive](https://drive.google.com/drive/folders/13oVPMyoBgNwEAsE_Ad3XVI1W5cNqfvrq). We have also prepared hmdb51 and ucf101 in the Google Drive for benchmarking. Please read [benchmark](docs/benchmark.md) for more details. For some of the algorithms (C3D, R2P1D and R3D), if you want to load the pre-trained weights and fine-tune, you need to download the weights from [Google Drive](https://drive.google.com/drive/folders/1fABdnH-l92q2RbA8hfQnUPZOYoTZCR-Q) and put it to [weights](weights).
+
+## Acknowledgement
+We gratefully acknowledge the Data Driven Discovery of Models (D3M) program of the Defense Advanced Research Projects Agency (DARPA).
diff --git a/autovideo/__init__.py b/autovideo/__init__.py
@@ -0,0 +1,4 @@
+name = "autovideo"
+__version__ = "0.0.1"
+
+from .utils import build_pipeline, extract_frames, fit, produce, fit_produce, produce_by_path, compute_accuracy_with_preds
diff --git a/autovideo/base/supervised_base.py b/autovideo/base/supervised_base.py
@@ -0,0 +1,85 @@
+import os
+import abc
+import typing
+from urllib.parse import urlparse
+
+import torch
+
+from d3m import container
+from d3m.primitive_interfaces.base import *
+from d3m.metadata import hyperparams, params
+
+__all__ = ('SupervisedParamsBase', 'SupervisedHyperparamsBase', 'SupervisedPrimitiveBase',)
+
+class SupervisedParamsBase(params.Params):
+    model: typing.Optional[typing.Any]
+
+class SupervisedHyperparamsBase(hyperparams.Hyperparams):
+    load_pretrained = hyperparams.UniformBool(
+        default=False,
+        semantic_types=['https://metadata.datadrivendiscovery.org/types/ControlParameter'],
+        description="Controls whether loading pre-trained model"
+    )
+
+class SupervisedPrimitiveBase(PrimitiveBase[Inputs, Outputs, Params, Hyperparams]):
+    """ A base for supervised learning for video racognition
+    """
+    def __init__(self, *, hyperparams: Hyperparams) -> None:
+        super().__init__(hyperparams=hyperparams)
+
+        self.tmp_dir = 'tmp'
+        if not os.path.exists(self.tmp_dir):
+            os.makedirs(self.tmp_dir)
+
+        # Use GPU if available
+        if torch.cuda.is_available():
+            print("--> Running on the GPU")
+            self.device = torch.device("cuda:0")
+        else:
+            self.device = torch.device("cpu") 
+            print("--> Running on the CPU")
+
+    def get_params(self) -> Params:
+        return SupervisedParamsBase(
+            model = self.model,
+        )
+
+    def set_params(self, *, params: Params) -> None:
+        self.model = params['model']
+
+    def fit(self, *, timeout: float = None, iterations: int = None) -> CallResult[None]:
+        """
+        Fit the model with training data. Check whether we need fine-tune
+        """
+        self._init_model(pretrained = self.hyperparams['load_pretrained'])
+        self._fit(timeout=timeout, iterations=iterations)
+
+        return CallResult(None)
+
+    def set_training_data(self, *, inputs: Inputs, outputs: Outputs) -> None:
+        self._inputs = inputs
+        self._outputs = outputs
+        try:
+            self._media_dir = urlparse(self._inputs.metadata.query_column(0)['location_base_uris'][0]).path
+        except KeyError:
+            pass
+
+    @abc.abstractmethod
+    def _fit(self, *, timeout: float = None, iterations: int = None):
+        """
+        Training
+        """
+
+    @abc.abstractmethod
+    def _init_model(self, pretrained):
+        """
+        Initialize the model. Loading the weights if pretrained is True
+        """
+
+    @abc.abstractmethod
+    def produce(self, *, inputs: container.DataFrame, timeout: float = None, iterations: int = None) -> CallResult[container.DataFrame]:
+        """
+        make the predictions
+        """
+
+