Dev paddle plsc arcface #130

Flowingsun007 · 2021-03-12T09:19:38Z

No description provided.

PaddlePaddle/PLSC/README.md

yuanms2 · 2021-03-29T01:33:20Z

测试的结论是？把oneflow 和 plsc 的结果放在一起看看？

Flowingsun007 · 2021-03-29T01:37:41Z

测试的结论是？把oneflow 和 plsc 的结果放在一起看看？

好的，我在readme中增加一栏对比说明

yuanms2 · 2021-03-29T01:42:16Z

测试的结论是？把oneflow 和 plsc 的结果放在一起看看？

好的，我在readme中增加一栏对比说明

也可以放在PR的讨论里面。仓库的其它对比实验都没有把其它框架和oneflow的放在一起对比。

Flowingsun007 · 2021-03-29T03:09:45Z

Arcface-rn50测试结果对比及说明

我们基于同样的硬件环境、同样的网络及数据集，在单机单卡～4机32卡的集群中，对不同框架(paddle-plsc及oneflow)进行了测评，对比框架在大规模人脸模型(模型并行)训练时的吞吐率、加速比等主要性能指标。

测试环境

为保证能更好的测试框架本身的性能好坏，做到公平公正，本次测评所有的测试均在相同的物理集群中测试，使用相同的软件环境等。测试环境共有4台机器，每台机器配置了8张V100 GPU显卡。（每台机器配置与NVIDA DGX-1接近）每台机器具体的硬件和软件配置描述如下：

Tesla V100-SXM2-16GB x 8
InfiniBand 100 Gb/sec (4X EDR)， Mellanox Technologies MT27700 Family
Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz
Memory 384G
Ubuntu 16.04.4 LTS (GNU/Linux 4.4.0-116-generic x86_64)
CUDA Version: 10.2, Driver Version: 440.33.01

nvidia-smi topo -m
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    mlx5_0  CPU Affinity
GPU0     X      NV1     NV1     NV2     NV2     SYS     SYS     SYS     NODE    0-11,24-35
GPU1    NV1      X      NV2     NV1     SYS     NV2     SYS     SYS     NODE    0-11,24-35
GPU2    NV1     NV2      X      NV2     SYS     SYS     NV1     SYS     PIX     0-11,24-35
GPU3    NV2     NV1     NV2      X      SYS     SYS     SYS     NV1     PIX     0-11,24-35
GPU4    NV2     SYS     SYS     SYS      X      NV1     NV1     NV2     SYS     12-23,36-47
GPU5    SYS     NV2     SYS     SYS     NV1      X      NV2     NV1     SYS     12-23,36-47
GPU6    SYS     SYS     NV1     SYS     NV1     NV2      X      NV2     SYS     12-23,36-47
GPU7    SYS     SYS     SYS     NV1     NV2     NV1     NV2      X      SYS     12-23,36-47
mlx5_0  NODE    NODE    PIX     PIX     SYS     SYS     SYS     SYS      X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

测试网络及数据集

测试网络统一使用以ResNet50为backbone的大规模人脸分类网络，loss为arcface，数据集使用MS1M-ArcFace，网络的FC全连接层采用模型并行的方式切分参数（全连接层参数切分至不同GPU上存放）。除此之外，使用FP32常规精度以及统一的batch size(128)，更详细的信息见：paddle-plsc README；以及：oneflow README

测试结果对比

paddle-plsc

node_num	gpu_num_per_node	batch_size_per_device	samples/s	speedup
1	1	128	397.78	1.00
1	4	128	1539.66	3.87
1	8	128	2545.3	6.4
2	8	128	5953.84	14.97
4	8	128	11084.53	27.87

oneflow

node_num	gpu_num_per_node	batch_size_per_device	samples/s	speedup
1	1	128	424.75	1.00
1	4	128	1652.16	3.89
1	8	128	3278.55	7.72
2	8	128	6343.74	14.94
4	8	128	12320.24	29.01

对比总结

在单机单卡下，oneflow的吞吐率为424.75(samples/s)，相比paddle-plsc的397.78，速度快了约6.8%；
在单机8卡下，吞吐率oneflow 3278.55 vs paddle-plsc 2545.3，oneflow快了约28.9%；
4机的加速比，oneflow是29.01 vs paddle-plsc 27.87，oneflow更为接近线性加速比(32)，多机下训练速度更快。

结论：总体来看，oneflow在单机、多机下的大规模人脸分类模型训练速度更快、加速比更高，框架性能更为优异。
（除此之外，对GPU显存的利用率更高，相同条件下GPU占用更低，更省显存)

yuanms2 · 2021-03-29T04:58:57Z

OK. 辛苦了。韩广云测试的环境是10Gbps的带宽，和我们配置不一样。

Flowingsun007 · 2021-03-29T09:14:59Z

OK. 辛苦了。韩广云测试的环境是10Gbps的带宽，和我们配置不一样。

恩，上面贴的数据都是在leinao相同环境上测的

Flowingsun007 added 3 commits March 12, 2021 17:13

add readme

67b8984

add plsc scripts

3d4ff0c

compute speed

0aadb1e

Flowingsun007 requested a review from guo-ran March 12, 2021 09:22

refine

a057d7f

Flowingsun007 marked this pull request as ready for review March 15, 2021 01:52

Flowingsun007 requested a review from nlqq March 20, 2021 00:25

nlqq reviewed Mar 25, 2021

View reviewed changes

PaddlePaddle/PLSC/README.md Outdated Show resolved Hide resolved

update readme

7e236cf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dev paddle plsc arcface #130

Dev paddle plsc arcface #130

Uh oh!

Flowingsun007 commented Mar 12, 2021

Uh oh!

Uh oh!

yuanms2 commented Mar 29, 2021

Uh oh!

Flowingsun007 commented Mar 29, 2021

Uh oh!

yuanms2 commented Mar 29, 2021

Uh oh!

Flowingsun007 commented Mar 29, 2021 •

edited

Loading

Uh oh!

yuanms2 commented Mar 29, 2021

Uh oh!

Flowingsun007 commented Mar 29, 2021

Uh oh!

Uh oh!

Dev paddle plsc arcface #130

Are you sure you want to change the base?

Dev paddle plsc arcface #130

Uh oh!

Conversation

Flowingsun007 commented Mar 12, 2021

Uh oh!

Uh oh!

yuanms2 commented Mar 29, 2021

Uh oh!

Flowingsun007 commented Mar 29, 2021

Uh oh!

yuanms2 commented Mar 29, 2021

Uh oh!

Flowingsun007 commented Mar 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Arcface-rn50测试结果对比及说明

测试环境

测试网络及数据集

测试结果对比

paddle-plsc

oneflow

对比总结

Uh oh!

yuanms2 commented Mar 29, 2021

Uh oh!

Flowingsun007 commented Mar 29, 2021

Uh oh!

Uh oh!

Flowingsun007 commented Mar 29, 2021 •

edited

Loading