Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

update the performance page of MXNet. #10761

Merged
merged 1 commit into from
May 2, 2018
Merged

Conversation

zheng-da
Copy link
Contributor

@zheng-da zheng-da commented May 1, 2018

Description

This updates the performance page of MXNet on CPU and GPU.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@zheng-da zheng-da requested a review from szha as a code owner May 1, 2018 03:01
@zheng-da
Copy link
Contributor Author

zheng-da commented May 1, 2018

@mli do you want to review and merge it?

@pengzhao-intel
Copy link
Contributor

@zheng-da did you set KM Affinity when test CPU performance?

@zheng-da
Copy link
Contributor Author

zheng-da commented May 1, 2018

@pengzhao-intel yes, i did. Do you find anything unexpected?

| 32 | 4883.77 | 854.4 | 1197.74 | 493.72 | 713.17 | 294.17 |
| Batch | Alexnet | VGG | Inception-BN | Inception-v3 | Resnet 50 | Resnet 152 |
|-------|---------|--------|--------------|--------------|-----------|------------|
| 1 | 243.93 | 43.59 | 68.62 | 35.52 | 67.41 | 23.65 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we're having a few regressions :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, @mli told me that M60 is supposed to be slower than M40.
VGG is slower because a different model was used. The original performance was measured a long time ago. Since then, the implementation of VGG has changed. The current version of VGG has many more layers. If you want to know more details, I think @TaoLv can tell you more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. benchmark_score.py was changed last December (commit) and VGG test was updated from VGG-11 to VGG-16. Perf numbers in this PR are measured on VGG-16 and previous perf numbers of MKLML were measured on VGG-11.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this change in VGG applies to all benchmark results. Not just MKLML vs. MKLDNN.

Copy link
Contributor

@marcoabreu marcoabreu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the chart! It seems like we're having some regressions on the GPU version of MXNet. It would be great if somebody could follow up on those.

@pengzhao-intel
Copy link
Contributor

That's good.
Maybe my data is a little out-of-date, we tested about 1 month ago with the master branch.
The below data is from AWS EC2 C5.18xlarge.

For the large BS=32, the performance of Alexnet/VGG/inception-xx have a sligt drop.
Other data are improved a lot, especailly for small batchsize.

Batch Alexnet   VGG   Inception-BN   Inception-v3   Resnet 50   Resnet 152  
1 390.53 253.92 81.57 74.66 124.13 99.56 62.26 52.91 76.22 69.99 32.92 27.96
2 596.45 441.89 100.84 101.57 206.58 165.66 93.36 87.23 119.55 105.04 46.8 40.27
4 710.77 584.52 119.04 109.45 275.55 266.70 127.86 132.27 148.62 149.86 59.36 56.50
8 921.4 810.96 120.38 115.13 380.82 372.59 157.11 167.24 167.95 181.04 70.78 75.85
16 1018.43 1146.89 115.3 124.87 411.67 466.26 168.71 181.07 178.54 188.69 75.13 82.46
32 1290.31 1458.73 107.19 126.04 483.34 518.18 179.38 182.45 193.47 186.43 85.86 81.10

@zheng-da
Copy link
Contributor Author

zheng-da commented May 1, 2018

@pengzhao-intel I'm not sure why the performance for large batch sizes gets worse. It seems to me that your performance was measured after the PRs that improved the performance of MKLDNN a while ago (otherwise, Alexnet should have much worse performance). However, the performance I just measured on C5.18x matches the performance I saw when I wrote the blog.
Can you find out which commit you used to measure the performance? Or which day?

@pengzhao-intel
Copy link
Contributor

pengzhao-intel commented May 1, 2018

SW : the master branch of mxnet, commit id: 48749a5

@zheng-da
Copy link
Contributor Author

zheng-da commented May 1, 2018

The commit was two months ago. It's surprising that the performance was better for large batch sizes. I'll try it again tomorrow.

@zheng-da
Copy link
Contributor Author

zheng-da commented May 1, 2018

@pengzhao-intel Here is the performance result on C5.18x for commit id: 48749a5. It's different from yours. Before running the benchmark, I set the thread affinity and the number of OMP threads as below. Do you see anything wrong?

ubuntu@ip-172-31-14-124:~/incubator-mxnet$  export KMP_AFFINITY=granularity=fine,compact,1,0
ubuntu@ip-172-31-14-124:~/incubator-mxnet$ cat /proc/cpuinfo  | grep processor | wc -l
72
ubuntu@ip-172-31-14-124:~/incubator-mxnet$ export OMP_NUM_THREADS=36
INFO:root:network: alexnet
INFO:root:device: cpu(0)
INFO:root:batch size  1, image/sec: 281.905581
INFO:root:batch size  2, image/sec: 476.257437
INFO:root:batch size  4, image/sec: 638.438876
INFO:root:batch size  8, image/sec: 909.938360
INFO:root:batch size 16, image/sec: 1072.400037
INFO:root:batch size 32, image/sec: 1439.211819
INFO:root:network: vgg-16
INFO:root:device: cpu(0)
INFO:root:batch size  1, image/sec: 48.835769
INFO:root:batch size  2, image/sec: 107.220745
INFO:root:batch size  4, image/sec: 113.991915
INFO:root:batch size  8, image/sec: 122.755506
INFO:root:batch size 16, image/sec: 113.957188
INFO:root:batch size 32, image/sec: 108.641076
INFO:root:network: inception-bn
INFO:root:device: cpu(0)
INFO:root:batch size  1, image/sec: 103.943418
INFO:root:batch size  2, image/sec: 175.506375
INFO:root:batch size  4, image/sec: 235.703251
INFO:root:batch size  8, image/sec: 372.001180
INFO:root:batch size 16, image/sec: 432.571657
INFO:root:batch size 32, image/sec: 510.245680
INFO:root:network: inception-v3
INFO:root:device: cpu(0)
INFO:root:batch size  1, image/sec: 55.605913
INFO:root:batch size  2, image/sec: 89.754883
INFO:root:batch size  4, image/sec: 130.370849
INFO:root:batch size  8, image/sec: 160.991983
INFO:root:batch size 16, image/sec: 167.312594
INFO:root:batch size 32, image/sec: 183.668760
INFO:root:network: resnet-50
INFO:root:device: cpu(0)
INFO:root:batch size  1, image/sec: 69.624827
INFO:root:batch size  2, image/sec: 104.606805
INFO:root:batch size  4, image/sec: 154.999145
INFO:root:batch size  8, image/sec: 171.728000
INFO:root:batch size 16, image/sec: 167.945690
INFO:root:batch size 32, image/sec: 169.422334
INFO:root:network: resnet-152
INFO:root:device: cpu(0)
INFO:root:batch size  1, image/sec: 30.524941
INFO:root:batch size  2, image/sec: 41.807083
INFO:root:batch size  4, image/sec: 59.314175
INFO:root:batch size  8, image/sec: 74.127132
INFO:root:batch size 16, image/sec: 73.972906
INFO:root:batch size 32, image/sec: 71.838468

@pengzhao-intel
Copy link
Contributor

The setting is same with ours.

From your log:
Alexnet BS=32 is 1439 matched with ours 1458 (your current is 1290).
VGG-16 BS=32 is 108 didn't match with ours 126 (your current is 108).
Inception-BN BS=32 is 510 matched with ours 518 (your current is 483).
Inception v3 BS=32 is 183 matched with our 182 (your current is 179).

So, the Alexnet and Inception data are similar but VGG is different, right?

@huangzhiyuan can provide more details.

@piiswrong piiswrong merged commit ebd8a6b into apache:master May 2, 2018
marcoabreu added a commit that referenced this pull request May 3, 2018
anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request May 7, 2018
jinhuang415 pushed a commit to jinhuang415/incubator-mxnet that referenced this pull request May 29, 2018
rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018
zheng-da added a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018
@zheng-da zheng-da deleted the update_perf branch September 29, 2018 21:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants