Upgrade oneDNN to v2.5.2 #71546

yanbing-j · 2022-01-20T09:00:59Z

This PR upgrades oneDNN to v2.5.2, and includes some building support for oneDNN v2.5.2.

v2.4 changes:

Improved performance for future Intel Xeon Scalable processor (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
Improved binary primitive performance for cases when one of the tensors is broadcasted.
Improved performance of reduction primitive, reorder, shuffle primitives.
Improved performance of depthwise convolution forward propagation for processors with Intel AVX5-12 support
Improved performance of forward inner product primitive for the shapes with minibatch equal to 1 for processors with Intel AVX-512 support
Improved performance of int8 matmul and inner product primitives for processors with Intel AVX2 and Intel DL Boost support

v2.5 changes:

Improved performance for future Intel Xeon Scalable processors (code name Sapphire Rapids). The functionality is now enabled by default and requires Linux kernel 5.16.
Improved performance of matmul primitive for processors with Intel AVX-512 support.

v2.5.2 changes:

Fixed performance regression in binary primitive with broadcast
Fixed segmentation fault in depthwise convolution primitive for shapes with huge spatial size for processors with Intel AVX-512 support

pytorch-probot · 2022-01-20T09:01:02Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/yanbing-j/pytorch/blob/bcccd4520ba473ec2711f102bcf4eaeb5c94a236/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-binary-conda	`ciflow/binaries`, `ciflow/binaries/conda`, `ciflow/default`	✅ triggered
linux-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries/libtorch`, `ciflow/default`	✅ triggered
linux-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries/libtorch`, `ciflow/default`	✅ triggered
linux-binary-manywheel	`ciflow/binaries`, `ciflow/binaries/wheel`, `ciflow/default`	✅ triggered
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`, `ciflow/xla`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-bionic-rocm4.5-py3.7	`ciflow/all`, `ciflow/linux`, `ciflow/rocm`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

facebook-github-bot · 2022-01-20T09:01:05Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/71546
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 65b6966 (more details on the Dr. CI page):

✅ None of the CI failures appear to be your fault 💚

1/1 broken upstream at merge base 027c0d7 on Jan 26 from 3:33pm to 7:31pm

🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Lint / quick-checks on Jan 26 from 3:33pm to 7:31pm (ef501e8 - 56511f8)
- 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

jgong5 · 2022-01-21T12:03:46Z

Hi @VitalyFedyunin , this is the PR for upgrading oneDNN to v2.5.2 with all the needed compatibility updates (due to name changes from MKLDNN to DNNL in oneDNN v2.5.2) for both build and runtime of PyTorch. It combines the build compatibility update from #69957 (which has been closed accordingly). It incorporates the runtime compatibility update in ideep (see intel/ideep@2e103b3). Ideep update was tagged with https://github.com/intel/ideep/tree/pytorch-rls-v2.5.2. Could you please review? Thanks!

facebook-github-bot · 2022-01-27T21:08:50Z

@VitalyFedyunin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-02-01T04:37:46Z

@VitalyFedyunin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: This PR upgrades oneDNN to v2.5.2, and includes some building support for oneDNN v2.5.2. v2.4 changes: - Improved performance for future Intel Xeon Scalable processor (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control. - Improved binary primitive performance for cases when one of the tensors is broadcasted. - Improved performance of reduction primitive, reorder, shuffle primitives. - Improved performance of depthwise convolution forward propagation for processors with Intel AVX5-12 support - Improved performance of forward inner product primitive for the shapes with minibatch equal to 1 for processors with Intel AVX-512 support - Improved performance of int8 matmul and inner product primitives for processors with Intel AVX2 and Intel DL Boost support v2.5 changes: - Improved performance for future Intel Xeon Scalable processors (code name Sapphire Rapids). The functionality is now enabled by default and requires Linux kernel 5.16. - Improved performance of matmul primitive for processors with Intel AVX-512 support. v2.5.2 changes: - Fixed performance regression in binary primitive with broadcast - Fixed segmentation fault in depthwise convolution primitive for shapes with huge spatial size for processors with Intel AVX-512 support Pull Request resolved: #71546 Reviewed By: george-qi Differential Revision: D33827108 Pulled By: VitalyFedyunin fbshipit-source-id: 8f5a19b331c82af5b0783f081e061e1034a93952

Summary: This PR upgrades oneDNN to v2.5.2, and includes some building support for oneDNN v2.5.2. v2.4 changes: - Improved performance for future Intel Xeon Scalable processor (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control. - Improved binary primitive performance for cases when one of the tensors is broadcasted. - Improved performance of reduction primitive, reorder, shuffle primitives. - Improved performance of depthwise convolution forward propagation for processors with Intel AVX5-12 support - Improved performance of forward inner product primitive for the shapes with minibatch equal to 1 for processors with Intel AVX-512 support - Improved performance of int8 matmul and inner product primitives for processors with Intel AVX2 and Intel DL Boost support v2.5 changes: - Improved performance for future Intel Xeon Scalable processors (code name Sapphire Rapids). The functionality is now enabled by default and requires Linux kernel 5.16. - Improved performance of matmul primitive for processors with Intel AVX-512 support. v2.5.2 changes: - Fixed performance regression in binary primitive with broadcast - Fixed segmentation fault in depthwise convolution primitive for shapes with huge spatial size for processors with Intel AVX-512 support Pull Request resolved: pytorch/pytorch#71546 Reviewed By: george-qi Differential Revision: D33827108 Pulled By: VitalyFedyunin fbshipit-source-id: 8f5a19b331c82af5b0783f081e061e1034a93952 (cherry picked from commit 9705212)

pytorch-probot bot added the ciflow/default label Jan 20, 2022

facebook-github-bot added the cla signed label Jan 20, 2022

pytorchbot added the open source label Jan 20, 2022

zou3519 requested review from ngimel and VitalyFedyunin January 20, 2022 17:08

zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 20, 2022

ngimel removed their request for review January 20, 2022 18:28

yanbing-j mentioned this pull request Jan 21, 2022

Building support for oneDNN v2.5. #69957

Closed

XiaobingSuper added the intel priority matters to intel architecture from performance wise label Jan 21, 2022

This comment has been minimized.

Sign in to view

zhuhaozhe and others added 5 commits January 27, 2022 11:06

fix MKLDNN<-> DNNL compatibility

4859676

continue support MKLDNN_CPU_RUNTIME flag by user

f573fca

fix tbb include dir

abe0305

Upgrade oneDNN to v2.5.2

c7884ba

Fix CI failures

65b6966

yanbing-j force-pushed the yanbing/upgrade_onednn_2_5_2 branch from ff8794b to 65b6966 Compare January 27, 2022 03:07

VitalyFedyunin approved these changes Jan 27, 2022

View reviewed changes

sanchitintel mentioned this pull request Jan 31, 2022

Add JIT graph fuser for oneDNN Graph API (Preview4) #68111

Closed

pytorchmergebot closed this in 4567d5d Feb 1, 2022

XiaobingSuper mentioned this pull request Feb 7, 2022

Weight gradient computation is incorrect with mkldnn #68868

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade oneDNN to v2.5.2 #71546

Upgrade oneDNN to v2.5.2 #71546

yanbing-j commented Jan 20, 2022

pytorch-probot bot commented Jan 20, 2022

⚛️ CI Flow

facebook-github-bot commented Jan 20, 2022 •

edited

Loading

jgong5 commented Jan 21, 2022

This comment has been minimized.

facebook-github-bot commented Jan 27, 2022

facebook-github-bot commented Feb 1, 2022

Upgrade oneDNN to v2.5.2 #71546

Upgrade oneDNN to v2.5.2 #71546

Conversation

yanbing-j commented Jan 20, 2022

pytorch-probot bot commented Jan 20, 2022

⚛️ CI Flow

facebook-github-bot commented Jan 20, 2022 • edited Loading

🔗 Helpful links

💊 CI failures summary and remediations

🚧 1 fixed upstream failure:

jgong5 commented Jan 21, 2022

This comment has been minimized.

facebook-github-bot commented Jan 27, 2022

facebook-github-bot commented Feb 1, 2022

facebook-github-bot commented Jan 20, 2022 •

edited

Loading