Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt#26103
Merged
hariharans29 merged 16 commits intomainfrom Oct 16, 2025
Merged
Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt#26103hariharans29 merged 16 commits intomainfrom
hariharans29 merged 16 commits intomainfrom
Conversation
…soft/onnxruntime into hari/mlas_conv_enhancement
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
edgchen1
reviewed
Oct 14, 2025
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
edgchen1
reviewed
Oct 14, 2025
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
edgchen1
reviewed
Oct 15, 2025
edgchen1
previously approved these changes
Oct 15, 2025
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
devang-ml
approved these changes
Oct 16, 2025
apsonawane
pushed a commit
that referenced
this pull request
Oct 17, 2025
…tion opt (#26103) ### Description This is an internal branch dupe of #25255 + some minor cosmetic changes to account for Copilot feedback ### Motivation and Context Improve performance of NCHW Conv - Both grouped convolutions and batched inputs should benefit from this change. For a detailed understanding of perf improvement, please refer to the numbers in #25255. Credit to @zoeczy and team for this improvement and code change --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
apsonawane
pushed a commit
that referenced
this pull request
Oct 20, 2025
…tion opt (#26103) ### Description This is an internal branch dupe of #25255 + some minor cosmetic changes to account for Copilot feedback ### Motivation and Context Improve performance of NCHW Conv - Both grouped convolutions and batched inputs should benefit from this change. For a detailed understanding of perf improvement, please refer to the numbers in #25255. Credit to @zoeczy and team for this improvement and code change --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
apsonawane
added a commit
that referenced
this pull request
Oct 21, 2025
Adds the following commits to the release-1.23.2 branch for ORT 1.23.2: - [TensorRT] Fix DDS output bug during engine update - PR: #26272 - commit id: 00e85dd - Fix shape inference failure with in-memory external data - PR: #26263 - commit id: d955476 - [CUDA] replace 90a-virtual by 90-virtual for forward compatible - PR: #26230 - commit id: b58911f - [QNN-EP] Fix logic flow bug - PR: #26148 - commit id: b282379 - Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt - PR: #26103 - commit id: 7362518 - Update qMoE spec to support block quantization - PR: #25641 - commit id: 7a8ffa8 - [VitisAI] add new api to VitisAI to save graph as a string - PR: #25602 - commit id: 3361d72 - [[Build] Lock torch, onnxscript and onnx-ir versions to latest] - PR: #26315 - commit id: ea69c4d --------- Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Yateng Hong <toothache9010@gmail.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: quic-calvnguy <quic_calvnguy@quicinc.com> Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com> Co-authored-by: yifei410 <31260809+yifei410@users.noreply.github.com> Co-authored-by: yifei <y.zhou@xilinx.com>
Contributor
|
Cherry-picked for 1.23.2. Removing the release tag and adding cherry-pick tag |
JonathanC-ARM
pushed a commit
to JonathanC-ARM/onnxruntime
that referenced
this pull request
Oct 24, 2025
…ead partition opt (microsoft#26103) ### Description This is an internal branch dupe of microsoft#25255 + some minor cosmetic changes to account for Copilot feedback ### Motivation and Context Improve performance of NCHW Conv - Both grouped convolutions and batched inputs should benefit from this change. For a detailed understanding of perf improvement, please refer to the numbers in microsoft#25255. Credit to @zoeczy and team for this improvement and code change --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
fs-eire
pushed a commit
that referenced
this pull request
Oct 24, 2025
…tion opt (#26103) ### Description This is an internal branch dupe of #25255 + some minor cosmetic changes to account for Copilot feedback ### Motivation and Context Improve performance of NCHW Conv - Both grouped convolutions and batched inputs should benefit from this change. For a detailed understanding of perf improvement, please refer to the numbers in #25255. Credit to @zoeczy and team for this improvement and code change --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
naomiOvad
pushed a commit
to naomiOvad/onnxruntime
that referenced
this pull request
Nov 2, 2025
…ead partition opt (microsoft#26103) ### Description This is an internal branch dupe of microsoft#25255 + some minor cosmetic changes to account for Copilot feedback ### Motivation and Context Improve performance of NCHW Conv - Both grouped convolutions and batched inputs should benefit from this change. For a detailed understanding of perf improvement, please refer to the numbers in microsoft#25255. Credit to @zoeczy and team for this improvement and code change --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This is an internal branch dupe of #25255 + some minor cosmetic changes to account for Copilot feedback
Motivation and Context
Improve performance of NCHW Conv - Both grouped convolutions and batched inputs should benefit from this change. For a detailed understanding of perf improvement, please refer to the numbers in #25255.
Credit to @zoeczy and team for this improvement and code change