Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Aug 29, 2025

This PR updates the ONNX Runtime documentation to clearly reflect that WinML is the preferred and future-safe execution path for Windows developers, replacing DirectML which is being deprecated.

Changes Made

1. DirectML Deprecation Notice

  • Added a prominent deprecation notice at the top of the new docs/execution-providers/DirectML-ExecutionProvider.md page
  • Includes clear migration guidance directing users to WinML as the replacement
  • Provides links to WinML documentation and installation instructions

2. WinML Prioritization

  • Created docs/get-started/with-windows.md that positions WinML as the "preferred and future-safe execution path" for Windows developers
  • Highlights WinML's key advantages over DirectML including automatic hardware optimization and simplified deployment
  • Provides clear quick-start instructions and next steps

3. Installation Documentation

  • Created comprehensive docs/install/index.md with dedicated WinML installation section
  • Includes multiple installation methods: NuGet, WinGet, and Microsoft Store
  • Clearly marks WinML as the "recommended installation method" for Windows developers

4. Developer Guidance

  • Explained WinML's relationship to ONNX Runtime across all documentation pages:
    • Same ONNX Runtime APIs: Ensures compatibility and easy migration
    • Dynamic EP Selection: Automatically chooses optimal execution provider based on hardware
    • Simplified Windows Deployment: Handles packaging and distribution complexities
  • Updated docs/WinML_principles.md to include the ONNX Runtime relationship explanation

5. Cross-linking and Navigation

Documentation Structure

The new documentation follows the URL structure mentioned in the issue:

  • /docs/execution-providers/DirectML-ExecutionProvider.htmlDirectML-ExecutionProvider.md
  • /docs/get-started/with-windows.htmlwith-windows.md
  • /docs/install/#winml-installsindex.md#winml-installs

All changes are minimal and focused, directly addressing the requirements to deprecate DirectML while promoting WinML as the preferred Windows solution for ONNX Runtime deployments.

Fixes #25899.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

johnpaultaken and others added 30 commits July 16, 2025 10:31
### Description
- Adding test framework and initial batch of test cases for the QNN EP GPU backend.

### Motivation and Context
- To ensure the QNN EP GPU backend does not break as ongoing changes are committed to the other QNN backends mainly.
### Description

Fix cuda build error when DEBUG_GENERATION is defined.

### Motivation and Context

In #24821, a dumping API
was removed:
`void Print(const char* name, int index, bool end_line)`
But related code is not updated.

In MatMulNBits, there is a recent change to add bfloat16 support, but
the tensor dumper only support BFloat16 but not __nv_bfloat16. This PR
adds functions to support __nv_bfloat16 in cuda tensor dumper.
### Description
This commit applies WGSL template to `MatMulNBitsWideTile` to improve
code readability and enables more flexible data handling.

As part of this change, support for 4-bit and 8-bit shaders has been
consolidated, and a common `CEIL_DIV` utility has been introduced. The
previous `ShaderUsage::UseUniform` and
`ShaderUsage::UseIndicesTypeAlias` flags are no longer necessary and
have been removed.

### Motivation and Context
See above
1. Update the docker images to install system updates(per vulnerability
management requirements)
2. Disable DNNL pipelines since 
     a. There was no active development.
     b. The code is incompatible with CMake 4.x. 
3. Disable migraphx pipeline due to license issues(conda is not free
unless you only use conda-forge packages).
4. Change all UBI8 based images to use AlmaLinux8.

I will make the base images public. They are under internal review.
Fixes Error:

Could not find com.qualcomm.qti:qnn-runtime:2.36.1
The nuget packaging pipeline fails with 
Could not find com.qualcomm.qti:qnn-runtime:2.36.1


https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=866702&view=results
…evice id. (#25427)

### Description
<!-- Describe your changes. -->
Restore ability to handle "VEN_QCOM" from an ACPI entry.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
### Description
This PR adds three new options for the TRT execution provider:

- trt_load_user_initializer
- trt_external_data_bytestream
- trt_external_data_bytestream_size

The idea is to use these options to leverage new TRT 10.13 APIs to give
the user more control on how the weights are loaded in the ONNX parser.

When `trt_load_user_initializer` is set to true, the EP will own the
weights instead of serializing the weights to ModelProto. This reduces
overhead in having to serialize large weights.

When `trt_external_data_bytestream / trt_external_data_bytestream_size`
is provided, the refitEngine() function will be able to read from this
bytestream directly to extract weights for the refitter.

Also fixes graph_proto_serializer to keep information about external
weights.

---------

Signed-off-by: Kevin Chen <kevinch@nvidia.com>
### Description
<!-- Describe your changes. -->
Changes from win-onnxruntime to onnxruntime 
Fix the build break.

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Enable free dimension override even when the graph optimization level is
0.
If the optimization level is above 0, the override will be applied
during level 1 optimization.
### Description

Update Flash Attention to support softmax sink in GQA.

Changes:
- [x] Update flash attention to support head_sink
- [x] Add test_gqa.py to test cuda, and remove test_gqa_cuda.py.

Note that the sink is treated as scaled, while the elements in QK GEMMs
is not scaled. The sink value does not need scaling or softcap, and it
joins softmax with those scaled or soft-capped values. There are two
ways to add sink to softmax:
* One way is to [patch
normalize_softmax_lse](https://github.com/microsoft/onnxruntime/blob/1cf1aa786f6e7f7e6abd6fba8b8aea2e7a43092c/onnxruntime/contrib_ops/cuda/bert/flash_attention/softmax.h#L143-L178)
to use sink to update max and sum. Pros is major change in one function;
Cons is the logic is a little complex since row_max is unscaled, while
row_sum is scaled.
* Another way is to change softmax_rescale_o to handle the sink directly
in the first block of online softmax by using an unscaled sink value. It
is a robust way to keep core algorithm consistent. Cons is need change
in multiple places, and it is little hard to work with softcap.

This PR use the the first approach for easy integration.

Note: Memory efficient attention change will be in separated PR.

### Motivation and Context
#25269
…5344)

### Description
<!-- Describe your changes. -->
Add QNN-EP aware Reshape handler to optimizae Tranpose-Reshape-Transpose pattern. Fallback to default handler if not applicable.
Test: Add new UT folder under QNN for Transpose optimization.


### Motivation and Context
One of MEP models performed poorly due to the Transpose optimization after Layout Transform, causing lots of redundant Transpose inserted.
Since directly modifying the algorithm may introduce huge impact, introduce an EP-aware Reshape handler instead to control the risk.
### Description
Internal CI Upgraded to 2025.2



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com>
### Description
Add QDQ scale propagation pass 

Enable dynamic path for NPU when enable_causallm is enabled 

Allow zero-element tensors to get set 
Fix the model copies and redefinitions for CPU fallback 
Added support for 2025.2 and enabled SimplifiedLayerNormalization op 

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Javier Martinez <javier.e.martinez@intel.com>
Co-authored-by: Ryan Metcalfe <107415876+RyanMetcalfeInt8@users.noreply.github.com>
Co-authored-by: Ankit Maheshkar <ankit.maheshkar@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: n1harika <niharika.sathish@intel.com>
### Description

To support writing an Execution Provider (EP) using the new EP ABI introduced in #24887, this PR adds value info for EP Context nodes to prevent shape inference errors during `Graph::Resolve`.

### Motivation and Context

When creating a new EP Context node whose input is the output of another EP Context node, Graph::Resolve fails to set the type for the new node's arguments. This is because EP Context nodes do not have a TypeAndShapeInferenceFunction defined, as shown here:
https://github.com/microsoft/onnxruntime/blob/5fdd4e4f2a2b6705a9a49a378a3b3496805067ee/onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L3289-L3337

As a result, an exception is thrown during shape inference:
https://github.com/microsoft/onnxruntime/blob/5fdd4e4f2a2b6705a9a49a378a3b3496805067ee/onnxruntime/core/graph/graph.cc#L2964

Specifically:

EP Context nodes lack TypeAndShapeInferenceFunction, so onnx_inferred_type is unavailable. existing_type is nullptr due to the logic in:
https://github.com/microsoft/onnxruntime/blob/9de58ac7a3d18d6ae7f7ae502b3f91361067f1b5/onnxruntime/core/session/ep_plugin_provider_interfaces.cc#L279

https://github.com/microsoft/onnxruntime/blob/9de58ac7a3d18d6ae7f7ae502b3f91361067f1b5/onnxruntime/core/session/ep_plugin_provider_interfaces.cc#L285

### Implementation
This PR attempts to add type information to EP Context nodes with best effort, ensuring that Graph::Resolve can proceed without errors even when type inference is not explicitly defined.
… to process EPContext node for ep_cache_context with bytes stream (#25389)

The existing API ReadOpAttr & CreateOpAttr for string type always assume there '\0' at the end. It blocks the EPs to embed/read the context binary byte buffer into EPContext node ep_cache_context attribute.
Update the customer op API ReadOpAttr for string type to avoid adding '\0' at the end.
Update CreateOpAttr API to construct the string with len.
Keep the strings type processing as it is for now.
Noticing intermittent timeouts around the 4hr mark but pipeline is
showing no errors. Increases timeout from 240min (4hr) to 270min (4.5hr)
Bumps [on-headers](https://github.com/jshttp/on-headers) and
[compression](https://github.com/expressjs/compression). These
dependencies needed to be updated together.
Updates `on-headers` from 1.0.2 to 1.1.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/jshttp/on-headers/releases">on-headers's
releases</a>.</em></p>
<blockquote>
<h2>1.1.0</h2>
<h2>Important</h2>
<ul>
<li>Fix <a
href="https://www.cve.org/CVERecord?id=CVE-2025-7339">CVE-2025-7339</a>
(<a
href="https://github.com/jshttp/on-headers/security/advisories/GHSA-76c9-3jph-rj3q">GHSA-76c9-3jph-rj3q</a>)</li>
</ul>
<h2>What's Changed</h2>
<ul>
<li>Migrate CI pipeline to GitHub actions by <a
href="https://github.com/carpasse"><code>@​carpasse</code></a> in <a
href="https://redirect.github.com/jshttp/on-headers/pull/12">jshttp/on-headers#12</a></li>
<li>fix README.md badges by <a
href="https://github.com/carpasse"><code>@​carpasse</code></a> in <a
href="https://redirect.github.com/jshttp/on-headers/pull/13">jshttp/on-headers#13</a></li>
<li>add OSSF scorecard action by <a
href="https://github.com/carpasse"><code>@​carpasse</code></a> in <a
href="https://redirect.github.com/jshttp/on-headers/pull/14">jshttp/on-headers#14</a></li>
<li>fix: use <code>ubuntu-latest</code> as ci runner by <a
href="https://github.com/UlisesGascon"><code>@​UlisesGascon</code></a>
in <a
href="https://redirect.github.com/jshttp/on-headers/pull/19">jshttp/on-headers#19</a></li>
<li>ci: apply OSSF Scorecard security best practices by <a
href="https://github.com/UlisesGascon"><code>@​UlisesGascon</code></a>
in <a
href="https://redirect.github.com/jshttp/on-headers/pull/20">jshttp/on-headers#20</a></li>
<li>👷 add upstream change detection by <a
href="https://github.com/ctcpip"><code>@​ctcpip</code></a> in <a
href="https://redirect.github.com/jshttp/on-headers/pull/31">jshttp/on-headers#31</a></li>
<li>✨ add script to update known hashes by <a
href="https://github.com/ctcpip"><code>@​ctcpip</code></a> in <a
href="https://redirect.github.com/jshttp/on-headers/pull/32">jshttp/on-headers#32</a></li>
<li>💚 update CI - add newer node versions by <a
href="https://github.com/ctcpip"><code>@​ctcpip</code></a> in <a
href="https://redirect.github.com/jshttp/on-headers/pull/33">jshttp/on-headers#33</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/carpasse"><code>@​carpasse</code></a>
made their first contribution in <a
href="https://redirect.github.com/jshttp/on-headers/pull/12">jshttp/on-headers#12</a></li>
<li><a
href="https://github.com/UlisesGascon"><code>@​UlisesGascon</code></a>
made their first contribution in <a
href="https://redirect.github.com/jshttp/on-headers/pull/19">jshttp/on-headers#19</a></li>
<li><a href="https://github.com/ctcpip"><code>@​ctcpip</code></a> made
their first contribution in <a
href="https://redirect.github.com/jshttp/on-headers/pull/31">jshttp/on-headers#31</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/jshttp/on-headers/compare/v1.0.2...v1.1.0">https://github.com/jshttp/on-headers/compare/v1.0.2...v1.1.0</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/jshttp/on-headers/blob/master/HISTORY.md">on-headers's
changelog</a>.</em></p>
<blockquote>
<h1>1.1.0 / 2025-07-17</h1>
<ul>
<li>Fix <a
href="https://www.cve.org/CVERecord?id=CVE-2025-7339">CVE-2025-7339</a>
(<a
href="https://github.com/jshttp/on-headers/security/advisories/GHSA-76c9-3jph-rj3q">GHSA-76c9-3jph-rj3q</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/jshttp/on-headers/commit/4b017af88f5375bbdf3ad2ee732d2c122e4f52b0"><code>4b017af</code></a>
1.1.0</li>
<li><a
href="https://github.com/jshttp/on-headers/commit/b636f2d08e6c1e0a784b53a13cd61e05c09bb118"><code>b636f2d</code></a>
♻️ refactor header array code</li>
<li><a
href="https://github.com/jshttp/on-headers/commit/3e2c2d46c3e9592f6a1c3a3a1dbe622401f95d39"><code>3e2c2d4</code></a>
✨ ignore falsy header keys, matching node behavior</li>
<li><a
href="https://github.com/jshttp/on-headers/commit/172eb41b99a5a290b27a2c43fe602ca33aa1c8ce"><code>172eb41</code></a>
✨ support duplicate headers</li>
<li><a
href="https://github.com/jshttp/on-headers/commit/c6e384908c9c6127d18831d16ab0bd96e1231867"><code>c6e3849</code></a>
🔒️ fix array handling</li>
<li><a
href="https://github.com/jshttp/on-headers/commit/6893518341bb4e5363285df086b3158302d3b216"><code>6893518</code></a>
💚 update CI - add newer node versions</li>
<li><a
href="https://github.com/jshttp/on-headers/commit/56a345d82b51a0dcb8d09f061f87b1fd1dc4c01e"><code>56a345d</code></a>
✨ add script to update known hashes</li>
<li><a
href="https://github.com/jshttp/on-headers/commit/175ab217155d525371a5416ff059f895a3a532a6"><code>175ab21</code></a>
👷 add upstream change detection (<a
href="https://redirect.github.com/jshttp/on-headers/issues/31">#31</a>)</li>
<li><a
href="https://github.com/jshttp/on-headers/commit/ce0b2c8fcd313d38d3534fb731050dc16e105bf6"><code>ce0b2c8</code></a>
ci: apply OSSF Scorecard security best practices (<a
href="https://redirect.github.com/jshttp/on-headers/issues/20">#20</a>)</li>
<li><a
href="https://github.com/jshttp/on-headers/commit/1a38c543e75cd06217b449531de10b1758e35299"><code>1a38c54</code></a>
fix: use <code>ubuntu-latest</code> as ci runner (<a
href="https://redirect.github.com/jshttp/on-headers/issues/19">#19</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/jshttp/on-headers/compare/v1.0.2...v1.1.0">compare
view</a></li>
</ul>
</details>
<details>
<summary>Maintainer changes</summary>
<p>This version was pushed to npm by <a
href="https://www.npmjs.com/~ulisesgascon">ulisesgascon</a>, a new
releaser for on-headers since your current version.</p>
</details>
<br />

Updates `compression` from 1.8.0 to 1.8.1
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/expressjs/compression/releases">compression's
releases</a>.</em></p>
<blockquote>
<h2>v1.8.1</h2>
<h2>What's Changed</h2>
<ul>
<li>fix(docs): update multiple links from http to https by <a
href="https://github.com/Phillip9587"><code>@​Phillip9587</code></a> in
<a
href="https://redirect.github.com/expressjs/compression/pull/222">expressjs/compression#222</a></li>
<li>ci: add dependabot for github actions by <a
href="https://github.com/bjohansebas"><code>@​bjohansebas</code></a> in
<a
href="https://redirect.github.com/expressjs/compression/pull/207">expressjs/compression#207</a></li>
<li>build(deps): bump github/codeql-action from 2.23.2 to 3.28.15 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/expressjs/compression/pull/228">expressjs/compression#228</a></li>
<li>build(deps): bump ossf/scorecard-action from 2.3.1 to 2.4.1 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/expressjs/compression/pull/229">expressjs/compression#229</a></li>
<li>build(deps-dev): bump eslint-plugin-import from 2.26.0 to 2.31.0 by
<a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/expressjs/compression/pull/230">expressjs/compression#230</a></li>
<li>build(deps-dev): bump supertest from 6.2.3 to 6.3.4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/expressjs/compression/pull/231">expressjs/compression#231</a></li>
<li>[StepSecurity] ci: Harden GitHub Actions by <a
href="https://github.com/step-security-bot"><code>@​step-security-bot</code></a>
in <a
href="https://redirect.github.com/expressjs/compression/pull/235">expressjs/compression#235</a></li>
<li>build(deps): bump github/codeql-action from 3.28.15 to 3.29.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/expressjs/compression/pull/243">expressjs/compression#243</a></li>
<li>build(deps): bump actions/upload-artifact from 4.3.1 to 4.6.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/expressjs/compression/pull/239">expressjs/compression#239</a></li>
<li>build(deps): bump ossf/scorecard-action from 2.4.1 to 2.4.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/expressjs/compression/pull/240">expressjs/compression#240</a></li>
<li>build(deps): bump actions/checkout from 4.1.1 to 4.2.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/expressjs/compression/pull/241">expressjs/compression#241</a></li>
<li>build(deps-dev): bump eslint-plugin-import from 2.31.0 to 2.32.0 by
<a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/expressjs/compression/pull/244">expressjs/compression#244</a></li>
<li>deps: on-headers@1.1.0 by <a
href="https://github.com/UlisesGascon"><code>@​UlisesGascon</code></a>
in <a
href="https://redirect.github.com/expressjs/compression/pull/246">expressjs/compression#246</a></li>
<li>Release: 1.8.1 by <a
href="https://github.com/UlisesGascon"><code>@​UlisesGascon</code></a>
in <a
href="https://redirect.github.com/expressjs/compression/pull/247">expressjs/compression#247</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
made their first contribution in <a
href="https://redirect.github.com/expressjs/compression/pull/228">expressjs/compression#228</a></li>
<li><a
href="https://github.com/step-security-bot"><code>@​step-security-bot</code></a>
made their first contribution in <a
href="https://redirect.github.com/expressjs/compression/pull/235">expressjs/compression#235</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/expressjs/compression/compare/1.8.0...v1.8.1">https://github.com/expressjs/compression/compare/1.8.0...v1.8.1</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/expressjs/compression/blob/master/HISTORY.md">compression's
changelog</a>.</em></p>
<blockquote>
<h1>1.8.1 / 2025-07-17</h1>
<ul>
<li>deps: on-headers@~1.1.0
<ul>
<li>Fix <a
href="https://www.cve.org/CVERecord?id=CVE-2025-7339">CVE-2025-7339</a>
(<a
href="https://github.com/expressjs/on-headers/security/advisories/GHSA-76c9-3jph-rj3q">GHSA-76c9-3jph-rj3q</a>)</li>
</ul>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/expressjs/compression/commit/83a0c45fe190f4fcb8b515c18065db9cb9029dd1"><code>83a0c45</code></a>
1.8.1</li>
<li><a
href="https://github.com/expressjs/compression/commit/ce62713129f4b33eac4b833e1722410091646395"><code>ce62713</code></a>
deps: on-headers@1.1.0 (<a
href="https://redirect.github.com/expressjs/compression/issues/246">#246</a>)</li>
<li><a
href="https://github.com/expressjs/compression/commit/f4acb23985fa345318d34d4a96acf555a883efeb"><code>f4acb23</code></a>
build(deps-dev): bump eslint-plugin-import from 2.31.0 to 2.32.0 (<a
href="https://redirect.github.com/expressjs/compression/issues/244">#244</a>)</li>
<li><a
href="https://github.com/expressjs/compression/commit/6eaebe63f2ecac191d402c570bde140488435c4c"><code>6eaebe6</code></a>
build(deps): bump actions/checkout from 4.1.1 to 4.2.2 (<a
href="https://redirect.github.com/expressjs/compression/issues/241">#241</a>)</li>
<li><a
href="https://github.com/expressjs/compression/commit/37e062312fd270f84b5f50f7c6f88312609633f5"><code>37e0623</code></a>
build(deps): bump ossf/scorecard-action from 2.4.1 to 2.4.2 (<a
href="https://redirect.github.com/expressjs/compression/issues/240">#240</a>)</li>
<li><a
href="https://github.com/expressjs/compression/commit/bc436b26283c2f85a9711085dd0e4a580de50ba7"><code>bc436b2</code></a>
build(deps): bump actions/upload-artifact from 4.3.1 to 4.6.2 (<a
href="https://redirect.github.com/expressjs/compression/issues/239">#239</a>)</li>
<li><a
href="https://github.com/expressjs/compression/commit/2f9f5726751ecf12f7c46a9d1493bcd1966e09a7"><code>2f9f572</code></a>
build(deps): bump github/codeql-action from 3.28.15 to 3.29.2 (<a
href="https://redirect.github.com/expressjs/compression/issues/243">#243</a>)</li>
<li><a
href="https://github.com/expressjs/compression/commit/5f13b148d2a1a2daaa8647e03592214bb240bf18"><code>5f13b14</code></a>
[StepSecurity] ci: Harden GitHub Actions (<a
href="https://redirect.github.com/expressjs/compression/issues/235">#235</a>)</li>
<li><a
href="https://github.com/expressjs/compression/commit/76e094548125afbf8089a482d5982dc96c7ce398"><code>76e0945</code></a>
build(deps-dev): bump supertest from 6.2.3 to 6.3.4 (<a
href="https://redirect.github.com/expressjs/compression/issues/231">#231</a>)</li>
<li><a
href="https://github.com/expressjs/compression/commit/ae6ee809dc0cb40febaf2a5bff298465bd5a207f"><code>ae6ee80</code></a>
build(deps-dev): bump eslint-plugin-import from 2.26.0 to 2.31.0 (<a
href="https://redirect.github.com/expressjs/compression/issues/230">#230</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/expressjs/compression/compare/1.8.0...v1.8.1">compare
view</a></li>
</ul>
</details>
<br />


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ts/transformers-test (#25429)

Bumps [transformers](https://github.com/huggingface/transformers) from
4.48.0 to 4.52.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/huggingface/transformers/releases">transformers's
releases</a>.</em></p>
<blockquote>
<h2>Patch release v4.51.3</h2>
<p>A mix of bugs were fixed in this patch; very exceptionally, we
diverge from semantic versioning to merge GLM-4 in this patch
release.</p>
<ul>
<li>Handle torch ver in flexattn (<a
href="https://redirect.github.com/huggingface/transformers/issues/37400">#37400</a>)</li>
<li>handle torch version edge cases (<a
href="https://redirect.github.com/huggingface/transformers/issues/37399">#37399</a>)</li>
<li>Add glm4 (<a
href="https://redirect.github.com/huggingface/transformers/issues/37388">#37388</a>)</li>
</ul>
<h1>Patch Release 4.51.2</h1>
<p>This is another round of bug fixes, but they are a lot more minor and
outputs were not really affected!</p>
<ul>
<li>Fix Llama4 offset (<a
href="https://redirect.github.com/huggingface/transformers/issues/37414">#37414</a>)
by <a
href="https://github.com/Cyrilvallez"><code>@​Cyrilvallez</code></a></li>
<li>Attention Quantization with FBGemm &amp; TP (<a
href="https://redirect.github.com/huggingface/transformers/issues/37384">#37384</a>)
by <a
href="https://github.com/MekkCyber"><code>@​MekkCyber</code></a></li>
<li>use rms_norm_eps for the L2Norm for Llama4 (<a
href="https://redirect.github.com/huggingface/transformers/issues/37418">#37418</a>)
by <a
href="https://github.com/danielhanchen"><code>@​danielhanchen</code></a></li>
<li>mark llama4 as not supported with fa2 (<a
href="https://redirect.github.com/huggingface/transformers/issues/37416">#37416</a>)
by <a
href="https://github.com/winglian"><code>@​winglian</code></a></li>
</ul>
<h1>Patch release v4.51.1</h1>
<p>Since the release of Llama 4, we have fixed a few issues that we are
now releasing in patch v4.51.1</p>
<ul>
<li>Fixing flex attention for torch=2.6.0 (<a
href="https://redirect.github.com/huggingface/transformers/issues/37285">#37285</a>)</li>
<li>more fixes for post-training llama4 (<a
href="https://redirect.github.com/huggingface/transformers/issues/37329">#37329</a>)</li>
<li>Remove HQQ from caching allocator warmup (<a
href="https://redirect.github.com/huggingface/transformers/issues/37347">#37347</a>)</li>
<li>fix derived berts _init_weights (<a
href="https://redirect.github.com/huggingface/transformers/issues/37341">#37341</a>)</li>
<li>Fix init empty weights without accelerate (<a
href="https://redirect.github.com/huggingface/transformers/issues/37337">#37337</a>)</li>
<li>Fix deepspeed with quantization (<a
href="https://redirect.github.com/huggingface/transformers/issues/37324">#37324</a>)</li>
<li>fix llama4 training (<a
href="https://redirect.github.com/huggingface/transformers/issues/37319">#37319</a>)</li>
<li>fix flex attn when optional args aren't passed (<a
href="https://redirect.github.com/huggingface/transformers/issues/37327">#37327</a>)</li>
<li>Multiple llama4 fixe (<a
href="https://redirect.github.com/huggingface/transformers/issues/37353">#37353</a>)</li>
</ul>
<p>Thanks all for your patience</p>
<h2>v4.51.0: Llama 4, Phi4-Multimodal, DeepSeek-v3, Qwen3</h2>
<h2>New Model Additions</h2>
<h3>Llama 4</h3>
<p><img
src="https://github.com/user-attachments/assets/d613b292-94b0-4902-9dc7-2d00693222e4"
alt="image" /></p>
<p>Llama 4, developed by Meta, introduces a new auto-regressive
Mixture-of-Experts (MoE) architecture.This generation includes two
models:</p>
<ul>
<li>The highly capable Llama 4 Maverick with 17B active parameters out
of ~400B total, with 128 experts.</li>
<li>The efficient Llama 4 Scout also has 17B active parameters out of
~109B total, using just 16 experts.</li>
</ul>
<p>Both models leverage early fusion for native multimodality, enabling
them to process text and image inputs. Maverick and Scout are both
trained on up to 40 trillion tokens on data encompassing 200 languages
(with specific fine-tuning support for 12 languages including Arabic,
Spanish, German, and Hindi).</p>
<p>For deployment, Llama 4 Scout is designed for accessibility, fitting
on a single server-grade GPU via on-the-fly 4-bit or 8-bit quantization,
while Maverick is available in BF16 and FP8 formats. These models are
released under the custom Llama 4 Community License Agreement, available
on the model repositories</p>
<p>Getting started with Llama 4 using transformers is straightforward.
Make sure you have transformers v4.51.0 or later installed:</p>
<pre><code>pip install -U transformers[hf_xet]
&lt;/tr&gt;&lt;/table&gt; 
</code></pre>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/huggingface/transformers/commit/945727948c1143a10ac6f7d811aa58bb0d126b5b"><code>9457279</code></a>
Release: v4.52.1</li>
<li><a
href="https://github.com/huggingface/transformers/commit/eaa301673a0a7a1a8c5d3f11c046d1592a7ae16b"><code>eaa3016</code></a>
Revert parallelism temporarily (<a
href="https://redirect.github.com/huggingface/transformers/issues/38240">#38240</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/b5f494632c0fff2527dd3140423408644a9b0076"><code>b5f4946</code></a>
Protect ParallelInterface</li>
<li><a
href="https://github.com/huggingface/transformers/commit/113424bcd53b92600f77d82f48add0a60fb41556"><code>113424b</code></a>
Release: v4.52.0</li>
<li><a
href="https://github.com/huggingface/transformers/commit/f834d368f6a21ed54188d9c96fbb9013b1d2c75f"><code>f834d36</code></a>
[gemma3] fix bidirectional attention mask (<a
href="https://redirect.github.com/huggingface/transformers/issues/38080">#38080</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/2edb0e4b4dda8172d5628ca7497a4125f28bf6fc"><code>2edb0e4</code></a>
[mllama] fix loading and inference (<a
href="https://redirect.github.com/huggingface/transformers/issues/38223">#38223</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/390f153469dfdc793e7a9c7eb4822ea76f4f796a"><code>390f153</code></a>
Add padding-free to bamba (<a
href="https://redirect.github.com/huggingface/transformers/issues/35861">#35861</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/2a79471318a9b7b16706f3bb5cd833c7e81919a6"><code>2a79471</code></a>
Fixing Bitnet after use_rms_norm introduction (<a
href="https://redirect.github.com/huggingface/transformers/issues/38229">#38229</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/9661896083c9d983341afa45cc4b84af01706e72"><code>9661896</code></a>
Enable Quantize KV Cache for Mistral Model (<a
href="https://redirect.github.com/huggingface/transformers/issues/35042">#35042</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/1c2f36b480e02c9027d2523746d34e27b39e01a4"><code>1c2f36b</code></a>
parallelism goes brrr (<a
href="https://redirect.github.com/huggingface/transformers/issues/37877">#37877</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/huggingface/transformers/compare/v4.48.0...v4.52.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=transformers&package-manager=pip&previous-version=4.48.0&new-version=4.52.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
add webgpu support for GatherBlockQuantized
### Description
<!-- Describe your changes. -->
Plugin EP data transfer and Stream support.

Add the ability for a plugin EP to provide an IDataTransfer
implementation and an OrtSyncStream implementation to do async data copy
outside of an inference session.

Example usage added for CUDA EP.

Caveat: Support for providing the OrtSyncStream from the data copy to
Session.Run will be a follow up PR. For the CUDA EP we can pass in the
native cudaStream_t from the OrtSyncStream used for the data copy to the
Run via CUDA EP provider options.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
Set compute capability only on Turing arch


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Setting the native compute capability was causing a regression in
performance.

@gaugarg-nv @ishwar-raut1 @ankan-ban
…V EP Unit Tests (#25323)

### Description
Remove fast_gelu operator from the base model created in NV TRT RTX EP
unit tests.



### Motivation and Context
The operator was added in the model to partition the model into
subgraphs which can be assigned to NV TRT RTX EP and CUDA EP, which
supports fast_gelu. But CUDA EP is not built when building ORT with NV
TRT RTX EP hence the unit tests fail with unsupported op error.

@ishwar-raut1 @ankan-ban
### Description
<!-- Describe your changes. -->

The error is:
```
..2025-07-17 11:21:36.861835596 [E:onnxruntime:, sequential_executor.cc:572 ExecuteKernel] Non-zero status code returned while running main_graph_11957213504832792607_0 node. Name:'CANNExecutionProvider_main_graph_11957213504832792607_0_0' Status Message: ~/code/onnxruntime/onnxruntime/core/framework/op_kernel.cc:83 virtual OrtValue* onnxruntime::OpKernelContext::OutputMLValue(int, const onnxruntime::TensorShape&) status.IsOK() was false. tensor.cc:57 CalculateTensorStorageSize Tensor shape.Size() must be >= 0

 [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running main_graph_11957213504832792607_0 node. Name:'CANNExecutionProvider_main_graph_11957213504832792607_0_0' Status Message: ~/code/onnxruntime/onnxruntime/core/framework/op_kernel.cc:83 virtual OrtValue* onnxruntime::OpKernelContext::OutputMLValue(int, const onnxruntime::TensorShape&) status.IsOK() was false. tensor.cc:57 CalculateTensorStorageSize Tensor shape.Size() must be >= 0
```
### Description
<!-- Describe your changes. -->
Add default logger to CreateEpFactories so a plugin EP can log errors
outside of an inference session.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
…25411)

### Description
- Adds documentation to state that the data pointer for an `OrtValue`
owned by an `OrtGraph` is stable during the lifetime of the `OrtSession`
that owns the `OrtGraph`.
- Adds documentation to the ort_graph_to_proto.h utils to show how to
create a `onnx::GraphProto` with external initializers that actually
point to in-memory data (same approach used internally within ORT).



### Motivation and Context
Clarification of usage of new graph apis.
Enable Conv Op and ConvTranspose Op with "auto_pad" param set as VALID

### Description
QNN_EP reject the Conv Op and ConvTranspose on HTP if "auto_pad" is
"VALID". This configuration is supported on HTP.



### Motivation and Context
To enable Conv and ConvTranspose op with auto_pad as "VALID" running on
NPU and prevent them from falling back to CPU.
yuslepukhin and others added 22 commits August 27, 2025 10:26
### Description
<!-- Describe your changes. -->
While memory profiling some models I noticed multiple file mapping
failures.
`WindowsEnv::MapFileIntoMemory()` While it properly checks for the
mapping offset to be granularity
  aligned, it calculates it as page aligned.
Also, while saving external tensors we do not need to align big tensors
to windows granularity or anything
  that is platform dependent. Set it to 4096 for all platforms.
  Granularity matters only for calculating mapping address.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Multiple failures for file mapping for certain models.
This saves some hundreds of Mbs for some models.
### Description
<!-- Describe your changes. -->
Fix packaging pipelines


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
During CIs and local builds Ort::Status() gets inherited from the base
due to using directives,
  however, that does not work for packaging pipelines.
Having default ctor is important for storing Status in containers if
needed.
Bumps [actions/setup-java](https://github.com/actions/setup-java) from 4
to 5.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/setup-java/releases">actions/setup-java's
releases</a>.</em></p>
<blockquote>
<h2>v5.0.0</h2>
<h2>What's Changed</h2>
<h3>Breaking Changes</h3>
<ul>
<li>Upgrade to node 24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/setup-java/pull/888">actions/setup-java#888</a></li>
</ul>
<p>Make sure your runner is updated to this version or newer to use this
release. v2.327.1 <a
href="https://github.com/actions/runner/releases/tag/v2.327.1">Release
Notes</a></p>
<h3>Dependency Upgrades</h3>
<ul>
<li>Upgrade Publish Immutable Action by <a
href="https://github.com/HarithaVattikuti"><code>@​HarithaVattikuti</code></a>
in <a
href="https://redirect.github.com/actions/setup-java/pull/798">actions/setup-java#798</a></li>
<li>Upgrade eslint-plugin-jest from 27.9.0 to 28.11.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-java/pull/730">actions/setup-java#730</a></li>
<li>Upgrade undici from 5.28.5 to 5.29.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-java/pull/833">actions/setup-java#833</a></li>
<li>Upgrade form-data to bring in fix for critical vulnerability by <a
href="https://github.com/gowridurgad"><code>@​gowridurgad</code></a> in
<a
href="https://redirect.github.com/actions/setup-java/pull/887">actions/setup-java#887</a></li>
<li>Upgrade actions/checkout from 4 to 5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-java/pull/896">actions/setup-java#896</a></li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li>Prevent default installation of JetBrains pre-releases by <a
href="https://github.com/priyagupta108"><code>@​priyagupta108</code></a>
in <a
href="https://redirect.github.com/actions/setup-java/pull/859">actions/setup-java#859</a></li>
<li>Improve Error Handling for Setup-Java Action to Help Debug
Intermittent Failures by <a
href="https://github.com/gowridurgad"><code>@​gowridurgad</code></a> in
<a
href="https://redirect.github.com/actions/setup-java/pull/848">actions/setup-java#848</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/gowridurgad"><code>@​gowridurgad</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-java/pull/848">actions/setup-java#848</a></li>
<li><a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-java/pull/888">actions/setup-java#888</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/setup-java/compare/v4...v5.0.0">https://github.com/actions/setup-java/compare/v4...v5.0.0</a></p>
<h2>v4.7.1</h2>
<h2>What's Changed</h2>
<h3>Documentation changes</h3>
<ul>
<li>Add Documentation to Recommend Using GraalVM JDK 17 Version to
17.0.12 to Align with GFTC License Terms by <a
href="https://github.com/aparnajyothi-y"><code>@​aparnajyothi-y</code></a>
in <a
href="https://redirect.github.com/actions/setup-java/pull/704">actions/setup-java#704</a></li>
<li>Remove duplicated GraalVM section in documentation by <a
href="https://github.com/Marcono1234"><code>@​Marcono1234</code></a> in
<a
href="https://redirect.github.com/actions/setup-java/pull/716">actions/setup-java#716</a></li>
</ul>
<h3>Dependency updates:</h3>
<ul>
<li>Upgrade <code>@​action/cache</code> from 4.0.0 to 4.0.2 by <a
href="https://github.com/aparnajyothi-y"><code>@​aparnajyothi-y</code></a>
in <a
href="https://redirect.github.com/actions/setup-java/pull/766">actions/setup-java#766</a></li>
<li>Upgrade <code>@​actions/glob</code> from 0.4.0 to 0.5.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-java/pull/744">actions/setup-java#744</a></li>
<li>Upgrade ts-jest from 29.1.2 to 29.2.5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-java/pull/743">actions/setup-java#743</a></li>
<li>Upgrade <code>@​action/cache</code> to 4.0.3 by <a
href="https://github.com/aparnajyothi-y"><code>@​aparnajyothi-y</code></a>
in <a
href="https://redirect.github.com/actions/setup-java/pull/773">actions/setup-java#773</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/setup-java/compare/v4...v4.7.1">https://github.com/actions/setup-java/compare/v4...v4.7.1</a></p>
<h2>v4.7.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Configure Dependabot settings by <a
href="https://github.com/HarithaVattikuti"><code>@​HarithaVattikuti</code></a>
in <a
href="https://redirect.github.com/actions/setup-java/pull/722">actions/setup-java#722</a></li>
<li>README Update: Added a permissions section by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://redirect.github.com/actions/setup-java/pull/723">actions/setup-java#723</a></li>
<li>Upgrade <code>cache</code> from version 3.2.4 to 4.0.0 by <a
href="https://github.com/aparnajyothi-y"><code>@​aparnajyothi-y</code></a>
in <a
href="https://redirect.github.com/actions/setup-java/pull/724">actions/setup-java#724</a></li>
<li>Upgrade <code>@actions/http-client</code> from 2.2.1 to 2.2.3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-java/pull/728">actions/setup-java#728</a></li>
<li>Upgrade <code>actions/publish-immutable-action</code> from 0.0.3 to
0.0.4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-java/pull/727">actions/setup-java#727</a></li>
<li>Upgrade <code>@types/jest</code> from 29.5.12 to 29.5.14 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/setup-java/pull/729">actions/setup-java#729</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/actions/setup-java/commit/dded0888837ed1f317902acf8a20df0ad188d165"><code>dded088</code></a>
Bump actions/checkout from 4 to 5 (<a
href="https://redirect.github.com/actions/setup-java/issues/896">#896</a>)</li>
<li><a
href="https://github.com/actions/setup-java/commit/0913e9a06eb8b69c62db76aa61f580c2b3a5b4e0"><code>0913e9a</code></a>
Upgrade to node 24 (<a
href="https://redirect.github.com/actions/setup-java/issues/888">#888</a>)</li>
<li><a
href="https://github.com/actions/setup-java/commit/e9343db97e09d87a3c50e544105d99fe912c204b"><code>e9343db</code></a>
Bumps form-data (<a
href="https://redirect.github.com/actions/setup-java/issues/887">#887</a>)</li>
<li><a
href="https://github.com/actions/setup-java/commit/ae2b61dbc685e60e4427b2e8ed4f0135c6ea8597"><code>ae2b61d</code></a>
Bump undici from 5.28.5 to 5.29.0 (<a
href="https://redirect.github.com/actions/setup-java/issues/833">#833</a>)</li>
<li><a
href="https://github.com/actions/setup-java/commit/c190c18febcf6c040d80b10ea201a05a2c320263"><code>c190c18</code></a>
Bump eslint-plugin-jest from 27.9.0 to 29.0.1 (<a
href="https://redirect.github.com/actions/setup-java/issues/730">#730</a>)</li>
<li><a
href="https://github.com/actions/setup-java/commit/67aec007b3fcabe15ca665bfccc1e255dd52e30d"><code>67aec00</code></a>
Fix: prevent default installation of JetBrains pre-releases (<a
href="https://redirect.github.com/actions/setup-java/issues/859">#859</a>)</li>
<li><a
href="https://github.com/actions/setup-java/commit/ebb356cc4e59bcf94f518203228485f5d40e4b58"><code>ebb356c</code></a>
Improve Error Handling for Setup-Java Action to Help Debug Intermittent
Failu...</li>
<li><a
href="https://github.com/actions/setup-java/commit/f4f1212c880fdec8162ea9a6493f4495191887b4"><code>f4f1212</code></a>
Update publish-immutable-actions.yml (<a
href="https://redirect.github.com/actions/setup-java/issues/798">#798</a>)</li>
<li>See full diff in <a
href="https://github.com/actions/setup-java/compare/v4...v5">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/setup-java&package-manager=github_actions&previous-version=4&new-version=5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…at info (#25841)

### Description
This PR adds a new API that applications can use to verify compatibility
of a precompiled model with the underlying system, using only the
compatibility info string from the model's metadata.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
- This is a feature to enable apps to check compatibility of a
precompiled model without necessarily having the model locally on the
device. This enables precompiled models to be stored remotely and
downloaded once the application has been able to confirm the validity of
a given model with EPs on the device.

### Testing
- New unit tests pass 
- For regression testing, built a private version of WinML + AMD NPU EP
with these changes. Ran the Cpp Selfcontained Desktop sample
successfully; ran with compilation and also re-ran using the
already-compiled model to verify that session initialization continued
to work as expected.

---------

Co-authored-by: Aditya Rastogi <adityar@ntdev.microsoft.com>
### Description
<!-- Describe your changes. -->
According to the [WebNN
spec](https://www.w3.org/TR/webnn/#api-mlgraphbuilder-batchnorm), the
batchNorm should have input names "mean" and "variance" instead of
"input_mean" and "input_var".


### Motivation and Context
This issue causes any BatchNorm with mean/variance inputs to fall back
to wasm.
…dLayerNorm (#25850)

### Description
Use similar shaders as SkipSimplifiedLayerNorm in SimplifiedLayerNorm,
to fix the performance issues with SimplifiedLayerNorm.

### Motivation and Context
Prior to this change, generation in Bitnet was bottlenecked on
SimplifiedLayerNorm
<img width="332" height="378" alt="image"
src="https://github.com/user-attachments/assets/3bc16ac1-ef7d-46bf-b403-92fc9192a2df"
/>
with this change performance has now improved to match
SkipSimplifiedLayerNorm
<img width="699" height="179" alt="image"
src="https://github.com/user-attachments/assets/30009d85-d5d9-4585-987a-b39ecf52e0b5"
/>
…s int32 (#25646)

### Description
This PR makes DequantizeLinear support non-zero zero_point when input
data type is int32.



### Motivation and Context
For WebNN use case, we have some scenarios that input data type is int32
and the zero_point is not zero for DequantizeLinear.
### Description

When using attention bias input for GQA op with FP16, on the platforms
that don't natively support FP16 math a cast to fp32 needs to be
performed, and thus a temporary buffer needs to be created to store the
fp32 values. The issue is that this temporary buffer was being allocated
/ deallocated inside of a loop for every token being processed.
Refactored the implementation so that the allocation takes place only
once.

Phi model throughput increased by 15%.
### Description
This change fixes correctness issues in two areas that were causing
failures in onnxruntime_test_all:

- DynamicQuantizeMatMul.WithConstantBInputs
- AttentionTest.Attention3DDefault
- AttentionTest.Attention3DWithPastAndPresentQkMatmul

What was wrong and how it’s fixed
1) DynamicQuantizeMatMul.WithConstantBInputs
- Root cause: The Kleidi dynamic quantization GEMM path could be
selected even when the B scales contained values such as (zero,
negative, or non-finite). That violates kernel assumptions and can lead
to incorrect results.
- Fix: In
`onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_matmul.cc`,
we now explicitly validate that all B scales are finite and strictly
positive before enabling the Kleidi/MLAS dynamic path. If any scale is
invalid, we disable that path.

2) Attention tests (Attention3DDefault,
Attention3DWithPastAndPresentQkMatmul)
- Root causes in
`onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp`:
- Incorrect handling of GEMM corner cases for alpha/beta and K==0 (e.g.,
not respecting C = beta*C when alpha==0 or K==0).
  - Unnecessary or premature fallbacks for small shapes.
- Fixes:
- Add early-outs for degenerate sizes: if M==0 or N==0, return handled.
  - Correctly implement alpha/beta semantics:

---------

Signed-off-by: Jonathan Clohessy <jonathan.clohessy@arm.com>
This change adds skip test for QMoE CPU tests when running on TensorRT
or CUDA EP.
In the QMoE kernel there was a memory overwrite bug in the accumulate
part, updated that and this fixed the python tests back
…pi_ (#25741)

### Description

Delay the call to `OrtGetApiBase()` until the first call to
`Ort::GetApi()` so that `OrtGetApiBase()` is typically called after
dynamic library loading.

### Motivation and Context

When ORT_API_MANUAL_INIT is not defined (which is the default), the
static `Ort::Global<void>::api_` has a dynamic initializer that calls
`OrtGetApiBase()->GetApi(ORT_API_VERSION)` This dynamic initialization
can cause problems when it interacts with other global/static
initialization. On Windows in particular, it can also cause deadlocks
when used in a dynamic library if OrtGetApiBase()->GetApi() attempts to
load any other libraries.

* Replace the templated `Global<void>::api_` with an inline static
initialized to nullptr.
* `Ort::GetApi()` now calls `detail::Global::GetApi()` which calls
`detail::Global::DefaultInit()` if initialization is needed.
* When `ORT_API_MANUAL_INIT` is defined, `DefaultInit()` returns
nullptr, which will eventually cause the program to crash. The callers
have violated the initialization contract by not calling one of the
`Ort::InitApi` overloads.
* When `ORT_API_MANUAL_INIT` is not defined, `DefaultInit()` uses a
function-level static to compute the result of
`OrtGetApiBase()->GetApi(ORT_API_VERSION)` once and return it.
* `Ort::Global<void>` has been replaced with a non-templated type and
moved inside a `detail` namespace. Since the `Global<void>` object was
documented as being used internally, it is believed that these changes
here are non-breaking, as they do not impact a public API. The public
APIs, `Ort::InitApi()` and `Ort::InitApi(const OrtApi*)` remain
unchanged.
* Add `#pragma detect_mismatch` to surface issues with compilation units
that disagree on how ORT_API_MANUAL_INIT is defined. (MSVC only.)

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
### Description
<!-- Describe your changes. -->
1. A Small change to use the shared allocator in Python binding. 
2. Remove the FP64 support from the EP. 


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

The Python GPU IO binding is necessary for performance. The change will
enable the shared allocator for GPU allocation.
The FP64 was using the FP32 inference—aligned WRT TRT RTX support.

---------

Co-authored-by: Gaurav Garg <gaugarg@nvidia.com>
### Description

Add a `enable_cann_subgraph` feature parameter. this parameter controls
whether graph splitting is performed and can help quickly identify
issues in certain scenarios.
…ting Node_GetTensorAttributeAsOrtValue (#25886)

### Description
Replace `Node_GetTensorAttributeAsOrtValue` with
`OpAttr_GetTensorAttributeAsOrtValue`.
Change the API signature to make it one of the `OpAttr` interfaces
instead of the `OrtNode` interface.

The original API was added
[here](#25566).
### Description
This change builds on top of #25841 , and adds the scaffolding necessary
to call into this API from C++ / C# / Python.

### Motivation and Context
#25454 talks more about the broader notion of precompiled model
compatibility. This change is directed at app developers whose apps may
want to determine if a particular precompiled model (e.g. on a server
somewhere) is compatible with the device where the application is
running. There is functionality in `OrtEpFactory` for making this
determination, which was exposed as a C API in #25841, and this change
makes the API more broadly available in other languages.

### Testing and Validation
Introduced new unit test cases across each language, and verified that
the API was being called and returned the correct result for the default
CPU EP.

---------

Co-authored-by: Aditya Rastogi <adityar@ntdev.microsoft.com>
### Description
- Introduce Level1 Transformer into qnn.preprocess to support various optimizations.

### Motivation and Context
- This change brings in several useful optimizations such as `ConvBnFusion` and `ConstantFolding`, which are part of
`TransformerLevel::Level1` and can benefit QNNEP.
- The goal is to optimize the ONNX model before quantization by integrating these passes into the Python tooling workflow.
…25887)

### Description
Minor fix weight name missing when not valid QDQ node group

### Motivation and Context
Some quantized model failed QDQ node group validation, the weights then won't be folded as initializer. QNN EP failed to handle the dynamic weights here due to the transpose op input name look up. This change make sure we process the weights tensor before adding transposes.
## Summary
Adds EP metadata library path support to enable custom ops DLL
registration with proper path resolution.

## Changes
- Added `library_path` metadata key to EP metadata infrastructure
- Pass resolved library path directly to `EpLibraryProviderBridge`
constructor
- Simplified implementation per reviewer feedback (removed virtual
method complexity)
- Added `#include <utility>` for std::move compliance

## Purpose
Enables downstream applications (like onnxruntime-genai) to resolve
relative custom ops library paths using EP metadata, improving DLL
registration reliability.

## Files Modified
- `plugin_ep/ep_factory_provider_bridge.h`
- `plugin_ep/ep_library.h` 
- `plugin_ep/ep_library_plugin.h`
- `plugin_ep/ep_library_provider_bridge.cc`
- `plugin_ep/ep_library_provider_bridge.h`
- `utils.cc`
### Description
This update introduces multiple improvements, fixes, and feature enhancements to the OpenVINO Execution Provider (OVEP) and related components in ONNX Runtime:

#### Configuration & Properties

- Updated load_config mapping to act as a passthrough to OpenVINO properties.
- Added support for providing layout information to inputs/outputs in OpenVINO.

#### Inference & Tensor Handling

- Improved OVInferRequest::SetTensor to correctly handle cached binding shape mismatches.
- Added support for self-detecting on-the-fly bfloat16 → float16 conversion.
- Fixed issues with input ONNX models when used with shared execution contexts.

#### Model Handling & Operator Support

- Fixed model copying behavior for QDQ stripping.
- Updated operator support status for OpenVINO 2025.2.

#### Platform & Integration Fixes

- Applied multiple PSU Lora fixes and related updates.
- Resolved filename confusion issues with wrapped OVIRs in EPCtx.
- Enabled memory-mapped native binaries for OpenVINO 2025.3.

#### Quality & Maintenance

- Addressed linting issues.
- Fixed coverage gaps in OVEP.
- Added a new test script for OpenVINO with ORT ABI integration.

---------

Co-authored-by: Ankit Maheshkar <ankit.maheshkar@intel.com>
Co-authored-by: Ryan Metcalfe <107415876+RyanMetcalfeInt8@users.noreply.github.com>
Co-authored-by: Klimenko, Mikhail <mikhail.klimenko@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel.com>
Co-authored-by: Garth Long <garth.long@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: MayureshV1 <47039074+MayureshV1@users.noreply.github.com>
Co-authored-by: Eric Crawford <eric.r.crawford@intel.com>
Co-authored-by: jatinwadhwa921 <110383850+jatinwadhwa921@users.noreply.github.com>
Co-authored-by: Vishnudas Thaniel S <vishnudas.thaniel.s@intel.com>
Co-authored-by: Javier Martinez <javier.e.martinez@intel.com>
### Description
Java API for compile model and EP discovery APIs. Roughly equivalent to
the C# version in #24604.

cc: @skottmckay.

I haven't quite got the CMake configured so the Java tests for the ep
registration only run when the ONNX Runtime shared provider support is
built, but everything else works. I expect that to be a quick fix, but
I'm not sure in what conditions it should be built and how we should
handle it so I don't know where/when to plumb it through.

### Motivation and Context
API parity for Java.
@MaanavD MaanavD changed the base branch from main to gh-pages August 29, 2025 20:05
@MaanavD MaanavD closed this Aug 29, 2025
Copilot AI changed the title [WIP] [Documentation] Update ONNX Runtime Documentation to Highlight WinML as the Preferred Windows Path [Documentation] Update ONNX Runtime Documentation to Highlight WinML as the Preferred Windows Path Aug 29, 2025
Copilot AI requested a review from MaanavD August 29, 2025 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Documentation] Update ONNX Runtime Documentation to Highlight WinML as the Preferred Windows Path