Skip to content

Commit

Permalink
[Feature] add model script, training recipe and trained weights of Mi…
Browse files Browse the repository at this point in the history
…xNet
  • Loading branch information
The-truthh committed Mar 6, 2023
1 parent bc51cb3 commit 1273a2d
Show file tree
Hide file tree
Showing 4 changed files with 563 additions and 0 deletions.
89 changes: 89 additions & 0 deletions configs/mixnet/README.MD
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# MixNet
> [MixConv: Mixed Depthwise Convolutional Kernels](https://arxiv.org/abs/1907.09595)
## Introduction

Depthwise convolution is becoming increasingly popular in modern efficient ConvNets, but its kernel size is often
overlooked. In this paper, the authors systematically study the impact of different kernel sizes, and observe that
combining the benefits of multiple kernel sizes can lead to better accuracy and efficiency. Based on this observation,
the authors propose a new mixed depthwise convolution (MixConv), which naturally mixes up multiple kernel sizes in a
single convolution. As a simple drop-in replacement of vanilla depthwise convolution, our MixConv improves the accuracy
and efficiency for existing MobileNets on both ImageNet classification and COCO object detection.[[1](#references)]

<p align="center">
<img src="https://user-images.githubusercontent.com/53842165/219263295-75de649e-d38b-4b05-bd26-1c96896f7e83.png" width=800 />
</p>
<p align="center">
<em>Figure 1. Architecture of MixNet [<a href="#references">1</a>] </em>
</p>

## Results

Our reproduced model performance on ImageNet-1K is reported as follows.

<div align="center">

| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
|----------|----------|-----------|-----------|------------|-----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------|
| MixNet_s | D910x8-G | 75.63 | 92.52 | 4.17 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/mixnet/mixnet_s_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/mixnet/mixnet_s-2a5ef3a3.ckpt) |

</div>

#### Notes

- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.

## Quick Start

### Preparation

#### Installation
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.

#### Dataset Preparation
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.

### Training

* Distributed Training

It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run

```shell
# distrubted training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/mixnet/mixnet_s_ascend.yaml --data_dir /path/to/imagenet
```

> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.

For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).

**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.

* Standalone Training

If you want to train or finetune the model on a smaller dataset without distributed training, please run:

```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/mixnet/mixnet_s_ascend.yaml --data_dir /path/to/dataset --distribute False
```

### Validation

To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.

```shell
python validate.py -c configs/mixnet/mixnet_s_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```

### Deployment

Please refer to the [deployment tutorial](https://github.com/mindspore-lab/mindcv/blob/main/tutorials/deployment.md) in MindCV.

## References

[1] Tan M, Le Q V. Mixconv: Mixed depthwise convolutional kernels[J]. arXiv preprint arXiv:1907.09595, 2019.
55 changes: 55 additions & 0 deletions configs/mixnet/mixnet_s_ascend.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True

# dataset
dataset: "imagenet"
data_dir: "path/to/imagenet"
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True

# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
hflip: 0.5
interpolation: "bicubic"
auto_augment: "randaug-m9-mstd0.5"
re_prob: 0.25
crop_pct: 0.875
mixup: 0.2
cutmix: 1.0

# model
model: "mixnet_s"
num_classes: 1000
pretrained: False
ckpt_path: ''
keep_checkpoint_max: 10
ckpt_save_dir: "./ckpt"
epoch_size: 600
dataset_sink_mode: True
amp_level: "O3"

# loss
loss: "CE"
label_smoothing: 0.1

# lr scheduler
scheduler: "warmup_cosine_decay"
lr: 0.2
min_lr: 0.00001
decay_epochs: 585
warmup_epochs: 15

# optimizer
opt: "momentum"
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.00002
loss_scale: 256
use_nesterov: False
3 changes: 3 additions & 0 deletions mindcv/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
inception_v3,
inception_v4,
layers,
mixnet,
mnasnet,
mobilenet_v1,
mobilenet_v2,
Expand Down Expand Up @@ -54,6 +55,7 @@
from .inception_v3 import *
from .inception_v4 import *
from .layers import *
from .mixnet import *
from .mnasnet import *
from .mobilenet_v1 import *
from .mobilenet_v2 import *
Expand Down Expand Up @@ -99,6 +101,7 @@
__all__.extend(["InceptionV3", "inception_v3"])
__all__.extend(["InceptionV4", "inception_v4"])
__all__.extend(layers.__all__)
__all__.extend(mixnet.__all__)
__all__.extend(mnasnet.__all__)
__all__.extend(mobilenet_v1.__all__)
__all__.extend(mobilenet_v2.__all__)
Expand Down
Loading

0 comments on commit 1273a2d

Please sign in to comment.