We compare our results with some popular frameworks and official releases in terms of speed.
- 8 NVIDIA Tesla V100 (16G) GPUs
- Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
- Python 3.7
- Paddlepaddle-develop(待定)
- CUDA 10.1
- CUDNN 7.6.3
- NCCL 2.1.15
- gcc 8.2.0
The time we measured is the average training time, including data processing and model training. The training speed is measure with ips(instance per second). The higher, the better. Note that we skip the first 50 iter times as they may contain the device warmup time.
Here we compare our Paddle Video repo with other video understanding toolboxes in the same data and model settings .
To ensure the fairness of the comparison, the comparison experiments were conducted under the same hardware environment and using the same dataset. The dataset we used is generated by the data preparation. Significant improvement can be observed when comparing with other video understanding framework as shown in the table below,Especially the Slowfast model is nearly 2x faster than the counterparts.
For each model setting, we kept the same data preprocessing methods to make sure the same feature input.
| Model | batch size x gpus | Paddle(ips) | Reference(ips) | MMAction2 (ips) | PySlowFast (ips)| | :------ :| :-------------------:|:---------------:|:---------------: | :---------------: |:---------------: | | TSM | 16x8 | 58.1 | 46.04(temporal-shift-module) | To do | X | | PPTSM | 16x8 | 57.6 | X | X | X | | TSN | 16x8 | 841.1 | To do (tsn-pytorch) | To do | X | | Slowfast| 16x8 | 99.5 | X | To do | 43.2 | | Attention_LSTM | 128x8 | 112.6 | X | X | X |
Model | Paddle(ips) | MMAction2 (ips) | BMN(boundary matching network) (ips) |
---|---|---|---|
BMN | To do | x | x |