Skip to content

Commit f0ef849

Browse files
committed
ConvNets update
1 parent 6a49d18 commit f0ef849

31 files changed

+1678
-898
lines changed

PyTorch/Classification/ConvNets/.gitmodules

Whitespace-only changes.
Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
1-
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:19.07-py3
1+
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:19.10-py3
22
FROM ${FROM_IMAGE_NAME}
33

4+
ADD requirements.txt /workspace/
5+
WORKDIR /workspace/
6+
RUN pip install --no-cache-dir -r requirements.txt
47
ADD . /workspace/rn50
58
WORKDIR /workspace/rn50
Lines changed: 68 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,88 @@
11
# Convolutional Networks for Image Classification in PyTorch
22

3-
In this repository you will find implementations of various image classification models.
3+
In this repository you will find implementations of various image classification models.
44

5-
Detailed information on each model can be found here:
5+
## Table Of Contents
6+
7+
* [Models](#models)
8+
* [Validation accuracy results](#validation-accuracy-results)
9+
* [Training performance results](#training-performance-results)
10+
* [Training performance: NVIDIA DGX-1 (8x V100 16G)](#training-performance-nvidia-dgx-1-(8x-v100-16G))
11+
* [Training performance: NVIDIA DGX-2 (16x V100 32G)](#training-performance-nvidia-dgx-2-(16x-v100-32G))
12+
* [Model comparison](#model-comparison)
13+
* [Accuracy vs FLOPS](#accuracy-vs-flops)
14+
* [Latency vs Throughput on different batch sizes](#latency-vs-throughput-on-different-batch-sizes)
15+
16+
## Models
17+
18+
The following table provides links to where you can find additional information on each model:
619

720
| **Model** | **Link**|
821
|:-:|:-:|
922
| resnet50 | [README](./resnet50v1.5/README.md) |
1023
| resnext101-32x4d | [README](./resnext101-32x4d/README.md) |
1124
| se-resnext101-32x4d | [README](./se-resnext101-32x4d/README.md) |
1225

13-
## Accuracy
26+
## Validation accuracy results
27+
28+
Our results were obtained by running the applicable
29+
training scripts in the [framework-container-name] NGC container
30+
on NVIDIA DGX-1 with (8x V100 16G) GPUs.
31+
The specific training script that was run is documented
32+
in the corresponding model's README.
33+
1434

35+
The following table shows the validation accuracy results of the
36+
three classification models side-by-side.
1537

16-
| **Model** | **AMP Top1** | **AMP Top5** | **FP32 Top1** | **FP32 Top1** |
38+
39+
| **arch** | **AMP Top1** | **AMP Top5** | **FP32 Top1** | **FP32 Top1** |
1740
|:-:|:-:|:-:|:-:|:-:|
1841
| resnet50 | 78.46 | 94.15 | 78.50 | 94.11 |
1942
| resnext101-32x4d | 80.08 | 94.89 | 80.14 | 95.02 |
2043
| se-resnext101-32x4d | 81.01 | 95.52 | 81.12 | 95.54 |
2144

2245

23-
## Training Performance
46+
## Training performance results
47+
48+
49+
### Training performance: NVIDIA DGX-1 (8x V100 16G)
50+
2451

52+
Our results were obtained by running the applicable
53+
training scripts in the pytorch-19.10 NGC container
54+
on NVIDIA DGX-1 with (8x V100 16G) GPUs.
55+
Performance numbers (in images per second)
56+
were averaged over an entire training epoch.
57+
The specific training script that was run is documented
58+
in the corresponding model's README.
2559

26-
### NVIDIA DGX-1 (8x V100 16G)
60+
The following table shows the training accuracy results of the
61+
three classification models side-by-side.
2762

28-
| **Model** | **Mixed Precision** | **FP32** | **Mixed Precision speedup** |
63+
64+
| **arch** | **Mixed Precision** | **FP32** | **Mixed Precision speedup** |
2965
|:-:|:-:|:-:|:-:|
3066
| resnet50 | 6888.75 img/s | 2945.37 img/s | 2.34x |
3167
| resnext101-32x4d | 2384.85 img/s | 1116.58 img/s | 2.14x |
3268
| se-resnext101-32x4d | 2031.17 img/s | 977.45 img/s | 2.08x |
3369

34-
### NVIDIA DGX-2 (16x V100 32G)
70+
### Training performance: NVIDIA DGX-2 (16x V100 32G)
71+
72+
73+
Our results were obtained by running the applicable
74+
training scripts in the pytorch-19.10 NGC container
75+
on NVIDIA DGX-2 with (16x V100 32G) GPUs.
76+
Performance numbers (in images per second)
77+
were averaged over an entire training epoch.
78+
The specific training script that was run is documented
79+
in the corresponding model's README.
80+
81+
The following table shows the training accuracy results of the
82+
three classification models side-by-side.
3583

36-
| **Model** | **Mixed Precision** | **FP32** | **Mixed Precision speedup** |
84+
85+
| **arch** | **Mixed Precision** | **FP32** | **Mixed Precision speedup** |
3786
|:-:|:-:|:-:|:-:|
3887
| resnet50 | 13443.82 img/s | 6263.41 img/s | 2.15x |
3988
| resnext101-32x4d | 4473.37 img/s | 2261.97 img/s | 1.98x |
@@ -45,7 +94,16 @@ Detailed information on each model can be found here:
4594
### Accuracy vs FLOPS
4695
![ACCvsFLOPS](./img/ACCvsFLOPS.png)
4796

48-
Dot size indicates number of trainable parameters
97+
Plot describes relationship between floating point operations
98+
needed for computing forward pass on a 224px x 224px image,
99+
for the implemented models.
100+
Dot size indicates number of trainable parameters.
49101

50102
### Latency vs Throughput on different batch sizes
51103
![LATvsTHR](./img/LATvsTHR.png)
104+
105+
Plot describes relationship between
106+
inference latency, throughput and batch size
107+
for the implemented models.
108+
109+

PyTorch/Classification/ConvNets/image_classification/dataloaders.py

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -93,20 +93,21 @@ def __init__(self, batch_size, num_threads, device_id, data_dir, crop, dali_cpu=
9393

9494
if dali_cpu:
9595
dali_device = "cpu"
96-
self.decode = ops.HostDecoderRandomCrop(device=dali_device, output_type=types.RGB,
97-
random_aspect_ratio=[0.75, 4./3.],
98-
random_area=[0.08, 1.0],
99-
num_attempts=100)
96+
self.decode = ops.ImageDecoder(device=dali_device, output_type=types.RGB)
10097
else:
10198
dali_device = "gpu"
10299
# This padding sets the size of the internal nvJPEG buffers to be able to handle all images from full-sized ImageNet
103100
# without additional reallocations
104-
self.decode = ops.nvJPEGDecoderRandomCrop(device="mixed", output_type=types.RGB, device_memory_padding=211025920, host_memory_padding=140544512,
105-
random_aspect_ratio=[0.75, 4./3.],
106-
random_area=[0.08, 1.0],
107-
num_attempts=100)
101+
self.decode = ops.ImageDecoder(device="mixed", output_type=types.RGB, device_memory_padding=211025920, host_memory_padding=140544512)
102+
103+
self.res = ops.RandomResizedCrop(
104+
device=dali_device,
105+
size=[crop, crop],
106+
interp_type=types.INTERP_LINEAR,
107+
random_aspect_ratio=[0.75, 4./3.],
108+
random_area=[0.08, 1.0],
109+
num_attempts=100)
108110

109-
self.res = ops.Resize(device=dali_device, resize_x=crop, resize_y=crop)
110111
self.cmnp = ops.CropMirrorNormalize(device = "gpu",
111112
output_dtype = types.FLOAT,
112113
output_layout = types.NCHW,
@@ -141,7 +142,7 @@ def __init__(self, batch_size, num_threads, device_id, data_dir, crop, size):
141142
num_shards = world_size,
142143
random_shuffle = False)
143144

144-
self.decode = ops.nvJPEGDecoder(device = "mixed", output_type = types.RGB)
145+
self.decode = ops.ImageDecoder(device = "mixed", output_type = types.RGB)
145146
self.res = ops.Resize(device = "gpu", resize_shorter = size)
146147
self.cmnp = ops.CropMirrorNormalize(device = "gpu",
147148
output_dtype = types.FLOAT,

0 commit comments

Comments
 (0)