Merge branch 'main' into branch_6

mindspore-lab · Dec 8, 2022 · 2855a55 · 2855a55
2 parents f937e6a + 1826034
commit 2855a55
Show file tree

Hide file tree

Showing 97 changed files with 2,316 additions and 385 deletions.
diff --git a/README.md b/README.md
@@ -262,6 +262,7 @@ Please see [configs](./configs) for the details about model performance and pret
 * Augmentation
 	* [AutoAugment](https://arxiv.org/abs/1805.09501)
 	* [RandAugment](https://arxiv.org/abs/1909.13719) 
+	* [Repeated Augmentation](https://openaccess.thecvf.com/content_CVPR_2020/papers/Hoffer_Augment_Your_Batch_Improving_Generalization_Through_Instance_Repetition_CVPR_2020_paper.pdf)
 	* RandErasing (Cutout)
 	* CutMix
 	* Mixup
@@ -287,10 +288,20 @@ Please see [configs](./configs) for the details about model performance and pret
 	* Label Smoothing
 	* Stochastic Depth (depends on networks)
 	* Dropout (depends on networks)
+* Loss
+	* Cross Entropy (w/ class weight and auxilary logit support)
+	* Binary Cross Entropy  (w/ class weight and auxilary logit support)
 </details>
 
 ## Notes
 ### What is New 
+- 2022/12/07
+1. Support lr warmup for all lr scheduling algorithms besides cosine decay.
+2. Add repeated augmentation, which can be enabled by setting `--aug_repeats` to be a value larger than 1 (typically, 3 is a common choice).
+
+- 2022/11/21
+1. Add visualization for loss and acc curves
+2. Support epochwise lr warmup cosine decay (previous is stepwise)
 - 2022/11/09
 1. Add 7 pretrained ViT models.
 2. Add RandAugment augmentation.

diff --git a/config.py b/config.py
@@ -63,6 +63,8 @@ def create_parser():
     group.add_argument('--drop_remainder', type=str2bool, nargs='?', const=True, default=True,
                        help='Determines whether or not to drop the last block whose data '
                             'row number is less than batch size (default=True)')
+    group.add_argument('--aug_repeats', type=int, default=0,
+                       help='Number of dataset repeatition for repeated augmentation. If 0 or 1, repeated augmentation is diabled. Otherwise, repeated augmentation is enabled and the common choice is 3. (Default: 0)')
 
     # Augmentation parameters
     group = parser.add_argument_group('Augmentation parameters')
@@ -157,25 +159,30 @@ def create_parser():
                        help='Enables the Nesterov momentum (default=False)')
     group.add_argument('--filter_bias_and_bn', type=str2bool, nargs='?', const=True, default=True,
                        help='Filter Bias and BatchNorm (default=True)')
+    group.add_argument('--eps', type=float, default=1e-10,
+                       help='Term Added to the Denominator to Improve Numerical Stability (default=1e-10)')
 
     # Scheduler parameters
     group = parser.add_argument_group('Scheduler parameters')
-    group.add_argument('--scheduler', type=str, default='warmup_cosine_decay',
-                       choices=['constant', 'warmup_cosine_decay', 'exponential_decay', 'step_decay', 'multi_step_decay'],
+    group.add_argument('--scheduler', type=str, default='cosine_decay',
+                       choices=['constant', 'cosine_decay', 'exponential_decay', 'step_decay', 'multi_step_decay'],
                        help='Type of scheduler (default="warmup_consine_decay")')
     group.add_argument('--lr', type=float, default=0.001,
                        help='learning rate (default=0.001)')
     group.add_argument('--min_lr', type=float, default=1e-6,
                        help='The minimum value of learning rate if scheduler supports (default=None)')
     group.add_argument('--warmup_epochs', type=int, default=3,
                        help='Warmup epochs (default=None)')
+    group.add_argument('--warmup_factor', type=float, default=0.0,
+                       help='Warmup factor of learning rate (default=0.0)')
     group.add_argument('--decay_epochs', type=int, default=100,
                        help='Decay epochs (default=None)')
     group.add_argument('--decay_rate', type=float, default=0.9,
                        help='LR decay rate if scheduler supports')
     group.add_argument('--multi_step_decay_milestones', type=list, default=[30, 60, 90],
-                       help='list of epoch milestones for MultStepDecayLR, decay LR by decay_rate at the milestone epoch.')
-    group.add_argument('--stepwise_lr_sched', type=str2bool, nargs='?', const=True, default=True, help='If False, LR will be updated in the begin of each new epoch. Otherwise, update learning rate in each step. (default=False)')
+                       help='list of epoch milestones for lr decay, which is ONLY effective for the multi_step_decay scheduler. LR will be decay by decay_rate at the milestone epoch.')
+    group.add_argument('--lr_epoch_stair', type=str2bool, nargs='?', const=True, default=False,
+                       help='If True, LR will be updated in the first step of each epoch and LRs are the same in the remaining steps in the epoch. Otherwise, learning rate is updated every  step dynamically. (default=False)')
 
     # Loss parameters
     group = parser.add_argument_group('Loss parameters')

diff --git a/configs/convit/README.md b/configs/convit/README.md
@@ -26,7 +26,7 @@ ConViT combines e the strengths of convolutional architectures and Vision Transf
 |  GPU   | convit_tiny_plus  |           |           |                 |            |                |            |          |        |
 | Ascend | convit_tiny_plus  |    77.00  |  93.60    |                 |            |        247     |            |          |        |
 |  GPU   | convit_small      |           |           |                 |            |                |            |          |        |
-| Ascend | convit_small      |           |           |                 |            |                |            |          |        |
+| Ascend | convit_small      |   81.63   |    95.59  |                 |            |         490    |            |          |        |
 |  GPU   | convit_small_plus |           |           |                 |            |                |            |          |        |
 | Ascend | convit_small_plus |           |           |                 |            |                |            |          |        |
 |  GPU   | convit_base       |           |           |                 |            |                |            |          |        |

diff --git a/configs/convit/convit_small_ascend.yaml b/configs/convit/convit_small_ascend.yaml
@@ -0,0 +1,70 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+# system config
+mode: 0
+distribute: True
+num_parallel_workers: 8
+
+# dataset config
+dataset: 'imagenet'
+data_dir: ''
+shuffle: True
+dataset_download: False
+batch_size: 192
+drop_remainder: True
+
+# Augmentation config
+image_resize: 224
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: 'bicubic'
+auto_augment: 'autoaug-mstd0.5'
+re_prob: 0.1
+mixup: 0.2
+cutmix: 1.0
+cutmix_prob: 1.0
+crop_pct: 0.915
+color_jitter: 0.4
+
+# model config
+model: 'convit_small'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 10
+ckpt_save_dir: './ckpt'
+epoch_size: 300
+dataset_sink_mode: True
+amp_level: 'O2'
+
+# loss config
+loss: 'CE'
+label_smoothing: 0.1
+
+# lr scheduler config
+scheduler: 'warmup_cosine_decay'
+lr: 0.0007
+min_lr: 0.000001
+warmup_epochs: 40
+decay_epochs: 260
+
+# optimizer config
+opt: 'adamw'
+weight_decay: 0.05
+loss_scale: 1024
+filter_bias_and_bn: True
+use_nesterov: False
diff --git a/configs/convit/convit_tiny_ascend.yaml b/configs/convit/convit_tiny_ascend.yaml
@@ -55,7 +55,7 @@ loss: 'CE'
 label_smoothing: 0.1
 
 # lr scheduler config
-scheduler: 'warmup_cosine_decay'
+scheduler: 'cosine_decay'
 lr: 0.00072
 min_lr: 0.000001
 warmup_epochs: 5

diff --git a/configs/convit/convit_tiny_gpu.yaml b/configs/convit/convit_tiny_gpu.yaml
@@ -53,7 +53,7 @@ loss: 'CE'
 label_smoothing: 0.1
 
 # lr scheduler config
-scheduler: 'warmup_cosine_decay'
+scheduler: 'cosine_decay'
 lr: 0.0005
 min_lr: 0.00001
 warmup_epochs: 10

diff --git a/configs/convit/convit_tiny_plus_ascend.yaml b/configs/convit/convit_tiny_plus_ascend.yaml
@@ -55,7 +55,7 @@ loss: 'CE'
 label_smoothing: 0.1
 
 # lr scheduler config
-scheduler: 'warmup_cosine_decay'
+scheduler: 'cosine_decay'
 lr: 0.00072
 min_lr: 0.000001
 warmup_epochs: 40

diff --git a/configs/densenet/README.md b/configs/densenet/README.md
@@ -46,34 +46,20 @@ Please download the [ImageNet-1K](https://www.image-net.org/download.php) datase
 
   ```shell
   # train densenet121 on 8 GPUs
-  export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-  mpirun -n 8 python train.py -c configs/densenet/densenet_121_gpu.yaml --data_dir /path/to/imagenet
+  mpirun -n 8 python train.py --config configs/densenet/densenet_121_gpu.yaml --data_dir /path/to/imagenet
   ```
-
-  Note that the number of GPUs/Ascends and batch size will influence the training results. To reproduce the training result at most, it is recommended to use the **same number of GPUs/Ascneds** with the same batch size.
 
-- **Finetuning.** Here is an example for finetuning a pretrained densenet121 on CIFAR10 dataset using Momentum optimizer.
-
-  ```shell
-  python train.py --model=densenet121 --pretrained --opt=momentum --lr=0.001 dataset=cifar10 --num_classes=10 --dataset_download
-  ```
+  Note that the number of GPUs/Ascends and batch size will influence the training results. To reproduce the training result at most, it is recommended to use the **same number of GPUs/Ascends** with the same batch size.
 
 Detailed adjustable parameters and their default value can be seen in [config.py](../../config.py).
 
 ### Validation
 
-- To validate the trained model, you can use `validate.py`. Here is an example for densenet121 to verify the accuracy of
-  pretrained weights.
-
-  ```shell
-  python validate.py --model=densenet121 --dataset=imagenet --val_split=val --pretrained
-  ```
-
 - To validate the model, you can use `validate.py`. Here is an example for densenet121 to verify the accuracy of your
   training.
 
   ```shell
-  python validate.py --model=densenet121 --dataset=imagenet --val_split=val --ckpt_path='./ckpt/densenet121-best.ckpt'
+  python validate.py --config configs/densenet/densenet_121_gpu.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/densenet121.ckpt
   ```
 
 ### Deployment (optional)

diff --git a/configs/densenet/README_CN.md b/configs/densenet/README_CN.md
@@ -41,28 +41,15 @@
 > [configs文件夹](../../configs)中列出了mindcv套件所包含的模型的各个规格的yaml配置文件(在ImageNet数据集上训练和验证的配置)。
 
   ```shell
-  export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-  mpirun -n 8 python train.py -c configs/densenet/densenet_121_gpu.yaml --data_dir /path/to/imagenet
-  ```
-
-- 下面是使用在ImageNet上预训练的densenet121模型和Momentum优化器在CIFAR10数据集上进行微调的示例。
-
-  ```shell
-  python train.py --model=densenet121 --pretrained --opt=momentum --lr=0.001 dataset=cifar10 --num_classes=10 --dataset_download
+  mpirun -n 8 python train.py --config configs/densenet/densenet_121_gpu.yaml --data_dir /path/to/imagenet
   ```
 
 详细的可调参数及其默认值可以在[config.py](../../config.py)中查看。
 
 ### 验证
 
-- 下面是使用`validate.py`文件验证densenet121的预训练模型的精度的示例。
-
-  ```shell
-  python validate.py --model=densenet121 --dataset=imagenet --val_split=val --pretrained
-  ```
-
 - 下面是使用`validate.py`文件验证densenet121的自定义参数文件的精度的示例。
 
   ```shell
-  python validate.py --model=densenet121 --dataset=imagenet --val_split=val --ckpt_path='./ckpt/densenet121-best.ckpt'
+  python validate.py --config configs/densenet/densenet_121_gpu.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/densenet121.ckpt
   ```
diff --git a/configs/densenet/densenet_121_ascend.yaml b/configs/densenet/densenet_121_ascend.yaml
@@ -51,7 +51,7 @@ loss: 'CE'
 label_smoothing: 0.1
 
 # lr scheduler config
-scheduler: 'warmup_cosine_decay'
+scheduler: 'cosine_decay'
 min_lr: 0.0
 lr: 0.1
 warmup_epochs: 0

diff --git a/configs/densenet/densenet_121_gpu.yaml b/configs/densenet/densenet_121_gpu.yaml
@@ -51,7 +51,7 @@ loss: 'CE'
 label_smoothing: 0.1
 
 # lr scheduler config
-scheduler: 'warmup_cosine_decay'
+scheduler: 'cosine_decay'
 min_lr: 0.0
 lr: 0.1
 warmup_epochs: 0

diff --git a/configs/densenet/densenet_161_ascend.yaml b/configs/densenet/densenet_161_ascend.yaml
@@ -51,7 +51,7 @@ loss: 'CE'
 label_smoothing: 0.1
 
 # lr scheduler config
-scheduler: 'warmup_cosine_decay'
+scheduler: 'cosine_decay'
 min_lr: 0.0
 lr: 0.1
 warmup_epochs: 0

diff --git a/configs/densenet/densenet_161_gpu.yaml b/configs/densenet/densenet_161_gpu.yaml
@@ -51,7 +51,7 @@ loss: 'CE'
 label_smoothing: 0.1
 
 # lr scheduler config
-scheduler: 'warmup_cosine_decay'
+scheduler: 'cosine_decay'
 min_lr: 0.0
 lr: 0.1
 warmup_epochs: 0

diff --git a/configs/densenet/densenet_169_ascend.yaml b/configs/densenet/densenet_169_ascend.yaml
@@ -51,7 +51,7 @@ loss: 'CE'
 label_smoothing: 0.1
 
 # lr scheduler config
-scheduler: 'warmup_cosine_decay'
+scheduler: 'cosine_decay'
 min_lr: 0.0
 lr: 0.1
 warmup_epochs: 0

diff --git a/configs/densenet/densenet_169_gpu.yaml b/configs/densenet/densenet_169_gpu.yaml
@@ -51,7 +51,7 @@ loss: 'CE'
 label_smoothing: 0.1
 
 # lr scheduler config
-scheduler: 'warmup_cosine_decay'
+scheduler: 'cosine_decay'
 min_lr: 0.0
 lr: 0.1
 warmup_epochs: 0

diff --git a/configs/densenet/densenet_201_ascend.yaml b/configs/densenet/densenet_201_ascend.yaml
@@ -51,7 +51,7 @@ loss: 'CE'
 label_smoothing: 0.1
 
 # lr scheduler config
-scheduler: 'warmup_cosine_decay'
+scheduler: 'cosine_decay'
 min_lr: 0.0
 lr: 0.1
 warmup_epochs: 0

diff --git a/configs/densenet/densenet_201_gpu.yaml b/configs/densenet/densenet_201_gpu.yaml
@@ -51,7 +51,7 @@ loss: 'CE'
 label_smoothing: 0.1
 
 # lr scheduler config
-scheduler: 'warmup_cosine_decay'
+scheduler: 'cosine_decay'
 min_lr: 0.0
 lr: 0.1
 warmup_epochs: 0

diff --git a/configs/mnasnet/README.md b/configs/mnasnet/README.md
@@ -0,0 +1,72 @@
+# MnasNet
+> [MnasNet: Platform-Aware Neural Architecture Search for Mobile](https://arxiv.org/abs/1807.11626)
+
+## Introduction
+***
+
+Designing convolutional neural networks (CNN) for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant efforts have been dedicated to design and improve mobile CNNs on all dimensions, it is very difficult to manually balance these trade-offs when there are so many architectural possibilities to consider. In this paper, we propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency. Unlike previous work, where latency is considered via another, often inaccurate proxy (e.g., FLOPS), our approach directly measures real-world inference latency by executing the model on mobile phones. To further strike the right balance between flexibility and search space size, we propose a novel factorized hierarchical search space that encourages layer diversity throughout the network.
+
+![](mnasnet.png)
+
+## Results
+***
+
+| Model           | Context   |  Top-1 (%)  | Top-5 (%)  |  Params (M)    | Train T. | Infer T. |  Download | Config | Log |
+|-----------------|-----------|-------|-------|:----------:|-------|--------|---|--------|--------------|
+| MnasNet-B1-0_75 |  D910x8-G  | 71.81 | 90.53 | 3.20 |  96s/epoch  |  | [model]() | [cfg]()    | [log]() |
+| MnasNet-B1-1_0 |  D910x8-G  | 74.28 | 91.70 | 4.42 | 96s/epoch | | [model]() | [cfg]()    | [log]() |
+| MnasNet-B1-1_4 | D910x8-G | 76.01 | 92.83 | 7.16 | 121s/epoch | | [model]() | [cfg]() | [log]() |
+
+#### Notes
+- All models are trained on ImageNet-1K training set and the top-1 accuracy is reported on the validatoin set.
+- Context: GPU_TYPE x pieces - G/F, G - graph mode, F - pynative mode with ms function.  
+
+## Quick Start
+***
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/download.php) dataset for model training and validation.
+
+### Training
+
+- **Hyper-parameters.** The hyper-parameter configurations for producing the reported results are stored in the yaml files in `mindcv/configs/mnasnet` folder. For example, to train with one of these configurations, you can run:
+
+  ```shell
+  export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+  mpirun -n 8 python train.py -c configs/mnasnet/mnasnet0.75_gpu.yaml --data_dir /path/to/imagenet
+  ```
+
+  Note that the number of GPUs/Ascends and batch size will influence the training results. To reproduce the training result at most, it is recommended to use the **same number of GPUs/Ascends** with the same batch size.
+
+Detailed adjustable parameters and their default value can be seen in [config.py](../../config.py).
+
+### Validation
+
+- To validate the trained model, you can use `validate.py`. Here is an example for mnasnet0_75 to verify the accuracy of pretrained weights.
+
+  ```shell
+  python validate.py 
+  -c configs/mnasnet/mnasnet0.75_ascend.yaml 
+  --data_dir=/path/to/imagenet 
+  --ckpt_path=/path/to/ckpt
+  ```
+
+- To validate the model, you can use `validate.py`. Here is an example for mnasnet0_75 to verify the accuracy of your training.
+
+  ```shell
+  python validate.py 
+  -c configs/mnasnet/mnasnet0.75_ascend.yaml 
+  --data_dir=/path/to/imagenet 
+  --ckpt_path=/path/to/ckpt
+  ```
+
+### Deployment (optional)
+
+Please refer to the deployment tutorial in MindCV.
+
+
+