-
Notifications
You must be signed in to change notification settings - Fork 27
Dev paddle plsc arcface #130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
测试的结论是? 把oneflow 和 plsc 的结果放在一起看看? |
好的,我在readme中增加一栏对比说明 |
也可以放在PR的讨论里面。仓库的其它对比实验都没有把其它框架和oneflow的放在一起对比。 |
Arcface-rn50测试结果对比及说明我们基于同样的硬件环境、同样的网络及数据集,在单机单卡~4机32卡的集群中,对不同框架(paddle-plsc及oneflow)进行了测评,对比框架在大规模人脸模型(模型并行)训练时的吞吐率、加速比等主要性能指标。 测试环境为保证能更好的测试框架本身的性能好坏,做到公平公正,本次测评所有的测试均在相同的物理集群中测试,使用相同的软件环境等。测试环境共有4台机器,每台机器配置了8张V100 GPU显卡。(每台机器配置与NVIDA DGX-1接近)每台机器具体的硬件和软件配置描述如下:
nvidia-smi topo -m
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 mlx5_0 CPU Affinity
GPU0 X NV1 NV1 NV2 NV2 SYS SYS SYS NODE 0-11,24-35
GPU1 NV1 X NV2 NV1 SYS NV2 SYS SYS NODE 0-11,24-35
GPU2 NV1 NV2 X NV2 SYS SYS NV1 SYS PIX 0-11,24-35
GPU3 NV2 NV1 NV2 X SYS SYS SYS NV1 PIX 0-11,24-35
GPU4 NV2 SYS SYS SYS X NV1 NV1 NV2 SYS 12-23,36-47
GPU5 SYS NV2 SYS SYS NV1 X NV2 NV1 SYS 12-23,36-47
GPU6 SYS SYS NV1 SYS NV1 NV2 X NV2 SYS 12-23,36-47
GPU7 SYS SYS SYS NV1 NV2 NV1 NV2 X SYS 12-23,36-47
mlx5_0 NODE NODE PIX PIX SYS SYS SYS SYS X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks 测试网络及数据集测试网络统一使用以ResNet50为backbone的大规模人脸分类网络,loss为arcface,数据集使用MS1M-ArcFace,网络的FC全连接层采用模型并行的方式切分参数(全连接层参数切分至不同GPU上存放)。除此之外,使用FP32常规精度以及统一的batch size(128),更详细的信息见:paddle-plsc README;以及:oneflow README 测试结果对比paddle-plsc
oneflow
对比总结
结论:总体来看,oneflow在单机、多机下的大规模人脸分类模型训练速度更快、加速比更高,框架性能更为优异。 |
OK. 辛苦了。 韩广云测试的环境是10Gbps的带宽,和我们配置不一样。 |
恩,上面贴的数据都是在leinao相同环境上测的 |
No description provided.