[TIPC-Benchmark]Support @to_static traing for Benchmark #1756

Aurelius84 · 2022-03-14T08:24:57Z

What's New?

此 PR 基于新Benchmark规范实现了 @to_static 动转静训练监控机制，在现有的功能上，为兼容性升级。

1. 使用方式

在动态图训练的基础上，开启动转静训练的方法如下：

配置参数名：to_static（不能拼写错误，大小写敏感）

# 方式一：对某个模型所有配置组合均开启动转静训练
bash test_tipc/benchmark_train.sh test_tipc/config/ResNet/ResNet50_train_infer_python.txt benchmark_train  to_static

# 方式二：对某个模型指定配置组合均开启动转静训练
bash test_tipc/benchmark_train.sh test_tipc/config/ResNet/ResNet50_train_infer_python.txt benchmark_train dynamic_bs8_fp32_DP_N1C2 to_static

2. 验证日志

此 PR 基于 RestNet和MobileNet 模型进行了单机单卡、多卡验证。

可以根据日志中的 Successfully to apply @to_static with specs XX 来判断动转静是否生效，日志如下：

[2022/03/14 08:13:16] root INFO: profiler_options : None
[2022/03/14 08:13:16] root INFO: train with paddle 0.0.0 and device Place(gpu:0)
W0314 08:13:24.551266 13696 gpu_context.cc:244] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0314 08:13:24.556710 13696 gpu_context.cc:272] device: 0, cuDNN Version: 8.1.
[2022/03/14 08:13:28] root INFO: Successfully to apply @to_static with specs: [InputSpec(shape=(-1, 3, 224, 224), dtype=paddle.float32, name=None)]
[2022/03/14 08:13:28] root WARNING: The training strategy in config files provided by PaddleClas is based on 4 gpus. But the number of gpus is 1 in current training. Please modify the stategy (learning rate, batch size and so on) if use config files in PaddleClas to train.
[2022/03/14 08:13:32] root INFO: [Train][Epoch 1/1][Iter: 0/160146]lr: 0.10000, top1: 0.00000, top5: 0.00000, CELoss: 7.45196, loss: 7.45196, batch_cost: 3.92930s, reader_cost: 0.99335, ips: 2.03599 samples/s, eta: 7 days, 6:47:41
[2022/03/14 08:13:32] root INFO: [Train][Epoch 1/1][Iter: 1/160146]lr: 0.10000, top1: 0.00000, top5: 0.00000, CELoss: 21.29071, loss: 21.29071, batch_cost: 2.01570s, reader_cost: 0.49713, ips: 3.96885 samples/s, eta: 3 days, 17:40:03

3. 方案介绍

现有的 Benchmark 方案是通过执行bash test_train_inference_python.sh脚本实现的。

通过解析test_tipc/config/test_xxx.txt 中trainer:norm_train(第15行)来分发训练配置。此处我们扩展了第20行的配置，新增了动转静trainer：

to_static_train:-o Global.to_static=True

此处会复用trainer:norm_train的配置，在其后追加-o Global.to_static=True 来实现开启动转静训练，以保证动转静训练和动态图训练的基本配置参数是对齐的。

test_tipc/benchmark_train.sh

test_tipc/config/MobileNetV1/MobileNetV1_train_infer_python.txt

weisy11

LGTM

[TIPC-Benchmark]Support @to_static traing for Benchmark

33a639d

LDOUBLEV reviewed Mar 14, 2022

View reviewed changes

test_tipc/benchmark_train.sh Show resolved Hide resolved

add info in log_name

5d233f4

Aurelius84 requested a review from LDOUBLEV March 14, 2022 12:42

LDOUBLEV reviewed Mar 15, 2022

View reviewed changes

test_tipc/benchmark_train.sh Outdated Show resolved Hide resolved

weisy11 reviewed Mar 15, 2022

View reviewed changes

test_tipc/benchmark_train.sh Outdated Show resolved Hide resolved

test_tipc/config/MobileNetV1/MobileNetV1_train_infer_python.txt Show resolved Hide resolved

fix dash

6eaed4f

Aurelius84 requested review from weisy11 and LDOUBLEV March 15, 2022 06:20

weisy11 approved these changes Mar 15, 2022

View reviewed changes

weisy11 merged commit 51c01cf into PaddlePaddle:develop Mar 15, 2022

This was referenced Sep 14, 2022

[TIPC]Support @to_static training for BMN PaddlePaddle/PaddleVideo#526

Merged

[TIPC]Support @to_static train for base-transformer PaddlePaddle/PaddleNLP#3277

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TIPC-Benchmark]Support @to_static traing for Benchmark #1756

[TIPC-Benchmark]Support @to_static traing for Benchmark #1756

Aurelius84 commented Mar 14, 2022 •

edited

Loading

weisy11 left a comment

[TIPC-Benchmark]Support @to_static traing for Benchmark #1756

[TIPC-Benchmark]Support @to_static traing for Benchmark #1756

Conversation

Aurelius84 commented Mar 14, 2022 • edited Loading

What's New?

1. 使用方式

2. 验证日志

3. 方案介绍

weisy11 left a comment

Choose a reason for hiding this comment

Aurelius84 commented Mar 14, 2022 •

edited

Loading