Skip to content

Backmerging with Msft commits #699

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 58 commits into from
Jun 2, 2025
Merged

Backmerging with Msft commits #699

merged 58 commits into from
Jun 2, 2025

Conversation

jatinwadhwa921
Copy link

Backmerging with Msft commits

fs-eire and others added 30 commits May 16, 2025 09:07
### Description
<!-- Describe your changes. -->
enable f16 on vulkan/nvidia GPUs


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
This commit adds a telemetry field to indicate if a debugger is attached
to the process.

### Motivation and Context
This is useful for ignoring events coming from processes being debugged.
### Description
<!-- Describe your changes. -->

mark Linux x64 supports webgpu

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
…24763)

### Description
<!-- Describe your changes. -->

Add '--enable_generic_interface' build flag to the node package for
Windows (both x64 and arm64) builds.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Disable a test for QNN to unblock the build pipeline. Should be caused
by a combination of PR changes.
### Description
Upgrade cutlass to 3.9.2



### Motivation and Context
To work on new features.
…24793)

### Description
Changes the namespace declaration from
```C#
namespace Microsoft.ML.OnnxRuntime.CompileApi;

// Code
```

to
```C#
namespace Microsoft.ML.OnnxRuntime.CompileApi {
    // Code
}
```

### Motivation and Context
file-scoped namespaces are not supported in C# 8.0, which results in an
error in our documentation publishing:
https://github.com/microsoft/onnxruntime/actions/workflows/publish-csharp-apidocs.yml
…crosoft#24802)

### Description
<!-- Describe your changes. -->

Currently some required ADO pipeline fails because of version mismatch
between vcpkg build and non vcpkg build. This PR fixes the failed
builds.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Validate ep.context_file_path option, make sure it failed if it's not
valid file path
### Description
<!-- Describe your changes. -->

1. re-enable wasm CPU tests. It was originally enabled but was later
disabled in a change that treat wasm build as cross-compiling.

2. Use build.py to populate the environment variables.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
WebNN doesn't provide a dedicated op for `ConvInteger`, this PR supports
`ConvInteger` op by decomposing it into `DequantizeLinear x, w -> Conv
-> Cast (to int32)`.

BTW, adds `ConvInteger` to layout sensitive op list for layout
transformation when the preferred layout is NHWC.
### Description

Enable vcpkg for webgpu

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->

upgrade emsdk to 4.0.8

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
…el` flow (microsoft#24799)

### Description

convert all session configs, i.e. key-value pairs into provider options, the key prefixed with `ort_session_config.`


### Motivation and Context
microsoft#24445 has a bug when `Ort::CompileModel` is used, not all session config are passed to VITISAI EP backend.
It is because that the `session_option` which holds a reference to `VitisiAIExectuionProviderFactory` is not as same as the
`session_option` used for `Ort::CompileModel`. `Ort::CompileModel` create another `session_option` behind scene.

The symptom of this bug is that only the session configs in the first `SessionOptions` object is passed to `VitisiAIExectuionProviderFactory` and session configs in the second `SessionOptions` are not, so that VITISAI EP backend sometimes assumes that ep.cache_context is not enabled, and then ep context cache model is not created properly.
…rosoft#24772)

### Description
<!-- Describe your changes. -->
SkipSimplifiedLayerNorm + QuickGelu bfloat16 CUDA implementation microsoft#24772


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
…e compiled (microsoft#24695)

### Description
#### Original compile approach where an EPContext model is generated as
a side-effect of creating a session:
- **Restore** previous behavior where:
- compiling a model that generates no EPContext nodes is silently
ignored (nothing is generated and no error is reported)
- compiling a previously compiled model is silently ignored (nothing is
generated and no error is reported)

#### Explicit compile API:
- **Retains** current behavior where compiling a model that does not
generate EPContext nodes still generates a model by default.
- Adds C/C++/C#/Python API called `setFlags` that allows the user to
specify what is considered an error.
- `OrtCompileApiFlags_ERROR_IF_NO_NODES_COMPILED`: CompileModel()
returns `ORT_FAIL` if no nodes were compiled.
- `OrtCompileApiFlags_ERROR_IF_OUTPUT_FILE_EXISTS`: CompileModel()
returns `ORT_FAIL` if a file with the same filename as the output model
exists.
- Adds logic to detect when the user is trying to compile a previously
compiled model and returns an `ORT_INVALID_GRAPH` error with a relevant
error message.


### Motivation and Context
A previous [PR changed the default
behavior](microsoft@b4f7a90#diff-e2d3910ae7593ee7ba4fd74e53f738fa973ae2fc32c069f1088ba458b91f8d4bL809)
of the original "implicit" compilation approach. This PR was motivated
by restoring the original behavior that users currently depend on. At
the same time, we want to allow users of the new explicit compile API to
determine what is considered an error.
### Description
support of dumping tensors of int8, uin t8, BFloat16, UInt4x2, and
Int4x2 data types in the tensor dumper.

### Motivation and Context
Help debugging of operators using these data types.
### Description

This change adds support for serializing the QNN graph to the new Deep Learning Container (DLC) format. It is meant to supplement and perhaps eventually replace use of the QnnSaver backend, which emits C++ source
files when `qnn_saver_path` is set.

* Add support for serializing to .dlc via the QnnIr backend.
* Don't silently fallback to QnnCpu when QnnSaver was explicitly selected as the execution backend.
* Minor fixes.

### Motivation and Context

QNN model libraries, produced by compiling the C++ files that may be produced by QnnSaver have a number of drawbacks. Most importantly, they are not cross-platform and cannot be visualized via Netron or other tools. For these reasons, we anticipate that they may eventually be deprecated in favor of DLC files. These containers typically include a platform-agnostic representation of the graph QNN's internal representation.

---------

Co-authored-by: Jeff Kilpatrick <jkilpat@qti.qualcomm.com>
### Description
<!-- Describe your changes. -->

Fix Node.js linux/x64 cuda12 installation.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->

Fixed a TRT context memory sharing bug where the context memory was
assigned to a unique_ptr that was immediately destructed upon leaving
scope.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

The bug seems to be introduced by a refactor work: microsoft#15833 :


![image](https://github.com/user-attachments/assets/eec0e363-b6b1-4831-9ee4-a1b3ed45116c)
### Description
Add LSTM support for QNN EP

### Motivation and Context
Add LSTM support for QNN EP
- QNN's 16x16 Conv doesn't support asymmetric int16 weight
- Insert Convert Op to convert from asymmetric uint16 weight to symmetric int16 weight

### Description
- QNN' Conv op doesn't support asymmetric INT16 weights.
- 16x16 Conv operators in ONNX models fallback to CPU execution provider and reporting higher inference times.
- Insert a Convert Op to convert asymmetric uint16 weight to symmetric int16 weight to schedule 16x16 Conv's on QNN EP provider.



### Motivation and Context
- This fixes Graph execution failures for models contain 16x16 Conv op
on QNN Execution provider
- This also improves Inference times of model contain 16x16 Conv op
### Description
Remove unused tensor dumper functions. 

Those functions are not needed any more since it is easy to make a
string with `::onnxruntime::MakeString` like in `DUMP_CPU_STRING`
macros.

### Motivation and Context
Follow up with
microsoft#24813 (comment).

Some functions were added, but not used any more. Remove them to avoid
maintenance cost.
### Description

- Match the graph input correctly
- Add GetGraphInputNumber function

### Motivation and Context

- The number of graph inputs and the number of tensor wrappers may not match.
- For example, for ResizeNearestNeighbor op, Qnn only cares about the 1st input, so the rest of the inputs are not converted to tensor wrappers. However, these remaining inputs still appear in the graph inputs, resulting in a discrepancy in the input quantities.
### Description
<!-- Describe your changes. -->
Small change to remove the MS Domain check on onnx model nodes


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
The check returns unsupported for some nodes having an MS Domain. Trt
RTX supports some MS domain ops. if return unsupported these ops falls
back to CPU EP

@ankan-ban @chilo-ms @gedoensmax @jywu-msft

Co-authored-by: iraut <iraut@nvidia.com>
- Previously, padding for rank-3 MaxPool was only computed for auto_pad="NOTSET", using the final output shape.
- Identified a broader issue during auto_pad="VALID" implementation: padding must be derived from the recalculated output shape.
- Added unit tests to cover all use cases of auto_pad.
- Enabled the failing unit test in the cpu pool test

### Description
This PR fixes an issue in the padding calculation logic for rank-3 MaxPool operations when using auto_pad. The bug stemmed from using the final output shape (rank-3) to compute padding, rather than the correct intermediate shape (rank-4) that MaxPool actually operates on. The logic has been updated to use the reshaped rank-4 output for accurate padding
computation. Unit tests have been added to validate behavior across all auto_pad modes.

### Motivation and Context
While implementing support for auto_pad="VALID" in MaxPool, we discovered that the padding for MaxPool rank-3 was being calculated using the final output shape, which is rank-3. However, MaxPool internally operates on a reshaped rank-4 tensor (via pre- and post-processing reshapes). As a result, the padding logic was misaligned with the actual shape used during pooling, leading to test failures.
### Description
Update Qnn default version to 2.34.0.250424
tianleiwu and others added 26 commits May 23, 2025 12:26
### Description

Major changes of spec:
* 2D scale shape: [N * n_blocks_per_col] => [N, n_blocks_per_col]
* 2D zero shape: [N * CeilDiv(n_blocks_per_col * bits, 8)] => [N,
CeilDiv(n_blocks_per_col * bits, 8)]
* For B, drop int32 type and only allow uint8.
* allow bfloat16 as input/output type.
* Mark input g_idx as deprecated (since it has no benefit on model size
and performance in inference).

Add a function CheckInputs to verify the input shape.

The reason of the shape change is to make scale and zero compatible with
other operators like DequantizeLinear and GatherBlockQuantized. That
will make it easy for graph fusion and model builder.

Note that ORT can still handle the legacy 1D format for scale and zero
points, and CUDA/CPU could still handle g_idx. However, they are
deprecated, and our tools shall generate 2D scale and zeros, and avoid
using g_idx going forward.

This change is backward compatible. Model from old spec can run in
latest ORT (CheckInputs handles 1D scale and zero points), and model
from latest spec can still run in older ORT (since older ORT does not
check dimension of scale and zero points)

### Motivation and Context

CUDA and CPU provider does not check inputs for MatMulNBits. It could
cause out of boundary access.

We are going to share the lm_head weights of MatMulNBits to
GatherBlockQuantized. 2D shape can be used in Gather directly, and we
can avoid Reshape nodes.

Our latest models published for foundry use 2D scale and zero points. So
I update the spec to reflect that.
### Description

Resolves the following issues starting in TensorRT 10.11:
- Version macros changed in `NvInferVersion.h`, update build to look for
new macros
- Updated deprecated APIs (setShapeValues -> setShapeValuesV2() to
support INT64 shape values)



### Motivation and Context

- Resolves building TensorRT EP from source with latest 10.11 release.

Signed-off-by: Kevin Chen <kevinch@nvidia.com>
### Description

TRT supports Bfloat 16 and ORT does as well. 
In addition the `setup.py` was missing a copy for NVTRT EP and TRT EP
can only be built against the packaged parser with TRT RTX.
### Description
 - Add support for ScatterND reduction attribute
 - Gracefully handle the unsupported reduction values
 - Add unit tests to validate Reduction attribute support

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
…and WebNN (microsoft#24830)

### Description
Add `map_info.h` to centralize the operation types and inputs mapping
between onnx and webnn.

### Motivation and Context
To simplify the maintenance of operation types and inputs. The mapping
of onnx input names and webnn input names will be used in the future to
check the `rankRange`.



@Honry, @fdwr, @guschmue, PTAL, thanks!

---------

Co-authored-by: Wanming Lin <wanming.lin@intel.com>
### Description
<!-- Describe your changes. -->

Currently, the XCode build with nodejs binding(`--use_xcode`) always
fails on Mac.
```
./build.sh --config Debug --use_xcode --use_webgpu --build_shared_lib --build_nodejs --parallel --compile_no_warning_as_error --skip_submodule_sync --cmake_extra_defines CMAKE_OSX_ARCHITECTURES=arm64 --skip_tests
```
The root cause is that the dylib locates on `/Debug/Debug` not `/Debug`
with using XCode generator. For other generator(e.g. make, ninja), the
dylib locates on `/Debug` as expected.

Mac pipeline can pass because they didn't use XCode generator.

<img width="913" alt="image"
src="https://github.com/user-attachments/assets/e1203fdb-d88a-4c06-abad-b641d502237c"
/>
…/docker/scripts (microsoft#24810)

Bumps [setuptools](https://github.com/pypa/setuptools) from 69.0.3 to
78.1.1.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/pypa/setuptools/blob/main/NEWS.rst">setuptools's
changelog</a>.</em></p>
<blockquote>
<h1>v78.1.1</h1>
<h2>Bugfixes</h2>
<ul>
<li>More fully sanitized the filename in PackageIndex._download. (<a
href="https://redirect.github.com/pypa/setuptools/issues/4946">#4946</a>)</li>
</ul>
<h1>v78.1.0</h1>
<h2>Features</h2>
<ul>
<li>Restore access to _get_vc_env with a warning. (<a
href="https://redirect.github.com/pypa/setuptools/issues/4874">#4874</a>)</li>
</ul>
<h1>v78.0.2</h1>
<h2>Bugfixes</h2>
<ul>
<li>Postponed removals of deprecated dash-separated and uppercase fields
in <code>setup.cfg</code>.
All packages with deprecated configurations are advised to move before
2026. (<a
href="https://redirect.github.com/pypa/setuptools/issues/4911">#4911</a>)</li>
</ul>
<h1>v78.0.1</h1>
<h2>Misc</h2>
<ul>
<li><a
href="https://redirect.github.com/pypa/setuptools/issues/4909">#4909</a></li>
</ul>
<h1>v78.0.0</h1>
<h2>Bugfixes</h2>
<ul>
<li>Reverted distutils changes that broke the monkey patching of command
classes. (<a
href="https://redirect.github.com/pypa/setuptools/issues/4902">#4902</a>)</li>
</ul>
<h2>Deprecations and Removals</h2>
<ul>
<li>Setuptools no longer accepts options containing uppercase or dash
characters in <code>setup.cfg</code>.</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/pypa/setuptools/commit/8e4868a036b7fae3208d16cb4e5fe6d63c3752df"><code>8e4868a</code></a>
Bump version: 78.1.0 → 78.1.1</li>
<li><a
href="https://github.com/pypa/setuptools/commit/100e9a61ad24d5a147ada57357425a8d40626d09"><code>100e9a6</code></a>
Merge pull request <a
href="https://redirect.github.com/pypa/setuptools/issues/4951">#4951</a></li>
<li><a
href="https://github.com/pypa/setuptools/commit/8faf1d7e0ca309983252e4f21837b73ee12e960f"><code>8faf1d7</code></a>
Add news fragment.</li>
<li><a
href="https://github.com/pypa/setuptools/commit/2ca4a9fe4758fcd39d771d3d3a5b4840aacebdf7"><code>2ca4a9f</code></a>
Rely on re.sub to perform the decision in one expression.</li>
<li><a
href="https://github.com/pypa/setuptools/commit/e409e8002932f2b86aae7b1abc8f8c2ebf96df2c"><code>e409e80</code></a>
Extract _sanitize method for sanitizing the filename.</li>
<li><a
href="https://github.com/pypa/setuptools/commit/250a6d17978f9f6ac3ac887091f2d32886fbbb0b"><code>250a6d1</code></a>
Add a check to ensure the name resolves relative to the tmpdir.</li>
<li><a
href="https://github.com/pypa/setuptools/commit/d8390feaa99091d1ba9626bec0e4ba7072fc507a"><code>d8390fe</code></a>
Extract _resolve_download_filename with test.</li>
<li><a
href="https://github.com/pypa/setuptools/commit/4e1e89392de5cb405e7844cdc8b20fc2755dbaba"><code>4e1e893</code></a>
Merge <a
href="https://github.com/jaraco/skeleton">https://github.com/jaraco/skeleton</a></li>
<li><a
href="https://github.com/pypa/setuptools/commit/3a3144f0d2887fa37c06550f42a101e9eebd953a"><code>3a3144f</code></a>
Fix typo: <code>pyproject.license</code> -&gt;
<code>project.license</code> (<a
href="https://redirect.github.com/pypa/setuptools/issues/4931">#4931</a>)</li>
<li><a
href="https://github.com/pypa/setuptools/commit/d751068fd2627d6d8f1729e39cbcd8119049998f"><code>d751068</code></a>
Fix typo: pyproject.license -&gt; project.license</li>
<li>Additional commits viewable in <a
href="https://github.com/pypa/setuptools/compare/v69.0.3...v78.1.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=setuptools&package-manager=pip&previous-version=69.0.3&new-version=78.1.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
… session run (microsoft#24672)

### Description
Add option to enable tensor input and output bindings on CUDA before
perftest inference session run.
Output binding is handled by changing the memory allocator type to CUDA.
Input binding is handled by creating default ORT tensor on CPU,
initializing it with data, then cudaMemcpy the data from CPU to CUDA
allocated GPU tensor using the raw pointers.



### Motivation and Context
By this change, the end-to-end inference time reported is more accurate
as the CPU<->GPU overhead is moved out of the inference run
### Description
<!-- Describe your changes. -->
- Add SpaceToDepth fusion for QNN preprocess.
- The pattern in YOLOv2 is uncommon while the common seen one is left as future work.
- Add entry point/API for non-quantization user to preprocess models for QNN execution.
- Revise cmake to package newly introduced directory into Python wheel.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
- While executing YOLOv2 model on QNN-EP, a sequence of Reshape and Transpose having 6D shapes are falling back to CPU due to HTP limitation. Add fusion to fuse this sequence of ops into a single SpaceToDepth which can be directly executed on QNN-EP.
- Since current QNN preprocess is provided in `onnxruntime/python/tools/quantization/execution_providers/qnn/preprocess.py` which is under quantization directory, the path may be confusing for non-quantization users. In order to allow non-quantization users to preprocess models for QNN, introduce `onnxruntime/python/tools/qnn/preprocess.py` to serve as the entry point and provide API to preprocess models.
QNN [Softmax op defines pre-scale (`beta`)](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/MasterOpDef.html#softmax) that we can fold constant scalar multiply into it.
Windows on ARM support AMD64 emulation, so we can use win64 version of protoc.

Description
Compilation on ARM64 machine fails due to missing protoc dependency.

Motivation and Context
With this change we can compile onnxruntime on Windows on Arm devices without setting protobuf manually. CMake will download and setup protoc dependency.
VCPKG has removed this feature.
**Description:**

This pull request refactors the symbol publishing workflow that uses the
internal REST API. It addresses a breaking change introduced by the
`Az.Accounts` module update (v5.0.1+) where `Get-AzAccessToken` now
returns a `SecureString`. Additionally, it improves the structure and
robustness of the custom symbol publishing steps.

**Problem:**

1. The pipeline recently stopped working due to an update in the Azure
PowerShell `Az.Accounts` module. The `Get-AzAccessToken` cmdlet now
returns a `SecureString` by default, which was incompatible with the
previous script that expected a plain string token when setting a
pipeline variable.
2. The previous implementation used two separate tasks: one
`AzurePowerShell@5` task to generate the token and set it as a pipeline
variable, and a subsequent `pwsh` task to consume this variable and make
REST API calls. This separation required converting the `SecureString`
to plain text before setting the pipeline variable.

**Solution:**

To address these issues and improve the pipeline's design:

1. The "Generate an Azure Token" (`AzurePowerShell@5`) task and the
"Publish Symbols using internal REST API" (`pwsh`) task have been
**combined into a single `AzurePowerShell@5` task.**
2.  Within this unified task:
* `Get-AzAccessToken` is called, and its `SecureString` output is stored
in a local PowerShell variable.
* The `SecureString` token is converted to plain text *only within the
scope of this script* and immediately before it's used in the
`Authorization` header for `Invoke-RestMethod` calls.
* The token is no longer passed between tasks via a pipeline variable,
enhancing security by limiting the scope of the plain text token.

**Key Changes:**

* **Enhanced `SecureString` Management:** The token remains a
`SecureString` for most of its lifetime within the script, reducing
exposure.
* **Improved Error Handling:** `try-catch` blocks have been added around
the token retrieval and `Invoke-RestMethod` calls for better error
reporting and pipeline stability.
* **Robust Parameter Handling:** Explicit conversion for boolean
parameters (e.g., `includePublicSymbolServer`) to ensure correct
PowerShell boolean types before JSON serialization.
The following unit tests failed when building ONNX Runtime with Visual Studio 17.14 in Release or RelWithDebInfo configuration.

- SparseTensorConversionTests.TestDenseToSparseConversion
- MeanVarianceNormalizationTest.AllAxes
- MVNContribOpTest.MeanVarianceNormalizationCPUTest_Version1_TO_8

This PR provides a workaround for the two MVN tests.
…#24884)

Fix for microsoft#24861

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
### Description
<!-- Describe your changes. -->

revise WASM CI to run test as later step than publishing artifacts. This
allows download the binary to diagnose test failures.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Handle NaN in softmax operator for WebGPU EP and JSEP.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…into `onnxruntime-web/wasm` build (microsoft#24836)

### Description
Fixes inference error from `ort-wasm-simd-threaded.mjs` not being
bundled into `ort.wasm.bundle.min.mjs` as it is for other
`bundle.min.mjs` builds.

### Motivation and Context
To decrease my app's bundle size, I followed the [conditional importing
guide](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/js/importing_onnxruntime-web#conditional-importing)
and imported the WASM-only build:
```diff
- import * as ort from 'onnxruntime-web';
+ import * as ort from 'onnxruntime-web/wasm';
```
After this change, creating an inference session would result in:
`TypeError: Failed to resolve module specifier
'./ort-wasm-simd-threaded.mjs'`.

This was because `ort-wasm-simd-threaded.mjs` was not bundled into the
build at `onnxruntime-web/wasm`, which points to
`ort.wasm.bundle.min.mjs`, despite how its name suggests. In other
builds with `bundle` in their name, the module is bundled, yet it was
not done so in the WASM one. This PR bundles the Javascript WASM runtime
in to match the other builds, fixing the error.
Bumps [ruff](https://github.com/astral-sh/ruff) from 0.11.10 to 0.11.11.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/releases">ruff's
releases</a>.</em></p>
<blockquote>
<h2>0.11.11</h2>
<h2>Release Notes</h2>
<h3>Preview features</h3>
<ul>
<li>[<code>airflow</code>] Add autofixes for <code>AIR302</code> and
<code>AIR312</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/17942">#17942</a>)</li>
<li>[<code>airflow</code>] Move rules from <code>AIR312</code> to
<code>AIR302</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/17940">#17940</a>)</li>
<li>[<code>airflow</code>] Update <code>AIR301</code> and
<code>AIR311</code> with the latest Airflow implementations (<a
href="https://redirect.github.com/astral-sh/ruff/pull/17985">#17985</a>)</li>
<li>[<code>flake8-simplify</code>] Enable fix in preview mode
(<code>SIM117</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/18208">#18208</a>)</li>
</ul>
<h3>Bug fixes</h3>
<ul>
<li>Fix inconsistent formatting of match-case on <code>[]</code> and
<code>_</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/18147">#18147</a>)</li>
<li>[<code>pylint</code>] Fix <code>PLW1514</code> not recognizing the
<code>encoding</code> positional argument of <code>codecs.open</code>
(<a
href="https://redirect.github.com/astral-sh/ruff/pull/18109">#18109</a>)</li>
</ul>
<h3>CLI</h3>
<ul>
<li>Add full option name in formatter warning (<a
href="https://redirect.github.com/astral-sh/ruff/pull/18217">#18217</a>)</li>
</ul>
<h3>Documentation</h3>
<ul>
<li>Fix rendering of admonition in docs (<a
href="https://redirect.github.com/astral-sh/ruff/pull/18163">#18163</a>)</li>
<li>[<code>flake8-print</code>] Improve print/pprint docs for
<code>T201</code> and <code>T203</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/18130">#18130</a>)</li>
<li>[<code>flake8-simplify</code>] Add fix safety section
(<code>SIM110</code>,<code>SIM210</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/18114">#18114</a>,<a
href="https://redirect.github.com/astral-sh/ruff/pull/18100">#18100</a>)</li>
<li>[<code>pylint</code>] Fix docs example that produced different
output (<code>PLW0603</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/18216">#18216</a>)</li>
</ul>
<h2>Contributors</h2>
<ul>
<li><a
href="https://github.com/AlexWaygood"><code>@​AlexWaygood</code></a></li>
<li><a
href="https://github.com/BradonZhang"><code>@​BradonZhang</code></a></li>
<li><a
href="https://github.com/BurntSushi"><code>@​BurntSushi</code></a></li>
<li><a
href="https://github.com/CodeMan62"><code>@​CodeMan62</code></a></li>
<li><a
href="https://github.com/InSyncWithFoo"><code>@​InSyncWithFoo</code></a></li>
<li><a
href="https://github.com/LaBatata101"><code>@​LaBatata101</code></a></li>
<li><a href="https://github.com/Lee-W"><code>@​Lee-W</code></a></li>
<li><a
href="https://github.com/Mathemmagician"><code>@​Mathemmagician</code></a></li>
<li><a
href="https://github.com/MatthewMckee4"><code>@​MatthewMckee4</code></a></li>
<li><a
href="https://github.com/MichaReiser"><code>@​MichaReiser</code></a></li>
<li><a
href="https://github.com/TomerBin"><code>@​TomerBin</code></a></li>
<li><a
href="https://github.com/VascoSch92"><code>@​VascoSch92</code></a></li>
<li><a
href="https://github.com/adamaaronson"><code>@​adamaaronson</code></a></li>
<li><a
href="https://github.com/brainwane"><code>@​brainwane</code></a></li>
<li><a
href="https://github.com/brandtbucher"><code>@​brandtbucher</code></a></li>
<li><a href="https://github.com/carljm"><code>@​carljm</code></a></li>
<li><a
href="https://github.com/dcreager"><code>@​dcreager</code></a></li>
<li><a
href="https://github.com/dhruvmanila"><code>@​dhruvmanila</code></a></li>
<li><a
href="https://github.com/dragon-dxw"><code>@​dragon-dxw</code></a></li>
<li><a
href="https://github.com/felixscherz"><code>@​felixscherz</code></a></li>
<li><a
href="https://github.com/kiran-4444"><code>@​kiran-4444</code></a></li>
<li><a
href="https://github.com/maxmynter"><code>@​maxmynter</code></a></li>
<li><a href="https://github.com/ntBre"><code>@​ntBre</code></a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md">ruff's
changelog</a>.</em></p>
<blockquote>
<h2>0.11.11</h2>
<h3>Preview features</h3>
<ul>
<li>[<code>airflow</code>] Add autofixes for <code>AIR302</code> and
<code>AIR312</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/17942">#17942</a>)</li>
<li>[<code>airflow</code>] Move rules from <code>AIR312</code> to
<code>AIR302</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/17940">#17940</a>)</li>
<li>[<code>airflow</code>] Update <code>AIR301</code> and
<code>AIR311</code> with the latest Airflow implementations (<a
href="https://redirect.github.com/astral-sh/ruff/pull/17985">#17985</a>)</li>
<li>[<code>flake8-simplify</code>] Enable fix in preview mode
(<code>SIM117</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/18208">#18208</a>)</li>
</ul>
<h3>Bug fixes</h3>
<ul>
<li>Fix inconsistent formatting of match-case on <code>[]</code> and
<code>_</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/18147">#18147</a>)</li>
<li>[<code>pylint</code>] Fix <code>PLW1514</code> not recognizing the
<code>encoding</code> positional argument of <code>codecs.open</code>
(<a
href="https://redirect.github.com/astral-sh/ruff/pull/18109">#18109</a>)</li>
</ul>
<h3>CLI</h3>
<ul>
<li>Add full option name in formatter warning (<a
href="https://redirect.github.com/astral-sh/ruff/pull/18217">#18217</a>)</li>
</ul>
<h3>Documentation</h3>
<ul>
<li>Fix rendering of admonition in docs (<a
href="https://redirect.github.com/astral-sh/ruff/pull/18163">#18163</a>)</li>
<li>[<code>flake8-print</code>] Improve print/pprint docs for
<code>T201</code> and <code>T203</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/18130">#18130</a>)</li>
<li>[<code>flake8-simplify</code>] Add fix safety section
(<code>SIM110</code>,<code>SIM210</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/18114">#18114</a>,<a
href="https://redirect.github.com/astral-sh/ruff/pull/18100">#18100</a>)</li>
<li>[<code>pylint</code>] Fix docs example that produced different
output (<code>PLW0603</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/18216">#18216</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/astral-sh/ruff/commit/0397682f1f50c9c1cc29293ac870f7720b0eda33"><code>0397682</code></a>
Bump 0.11.11 (<a
href="https://redirect.github.com/astral-sh/ruff/issues/18259">#18259</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/bcefa459f4069970bc9776575e2f776b8d130dc9"><code>bcefa45</code></a>
[ty] Rename <code>call-possibly-unbound-method</code> to
`possibly-unbound-implicit-call...</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/91b7a570c2bd1c9e1cab894ded866e885f28946a"><code>91b7a57</code></a>
[ty] Implement Python's floor division semantics for
<code>Literal</code> <code>int</code>s (<a
href="https://redirect.github.com/astral-sh/ruff/issues/18249">#18249</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/98da200d45b040401bc5c1ff04fd678d37d3dd3e"><code>98da200</code></a>
[ty] Fix server panic when calling <code>system_mut</code> (<a
href="https://redirect.github.com/astral-sh/ruff/issues/18252">#18252</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/029085fa7239eb25f9b60b6aff56aa54945becaf"><code>029085f</code></a>
[ty] Clarify <code>ty check</code> output default in documentation. (<a
href="https://redirect.github.com/astral-sh/ruff/issues/18246">#18246</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/6df10c638e3afed4a3fd9145d0353861e29d6acc"><code>6df10c6</code></a>
[<code>pylint</code>] Fix docs example that produced different output
(<code>PLW0603</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/issues/18216">#18216</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/bdf488462a5a5c0d4e104eff4e299829019b657d"><code>bdf4884</code></a>
Preserve tuple parentheses in case patterns (<a
href="https://redirect.github.com/astral-sh/ruff/issues/18147">#18147</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/01eeb2f0d6894f413048ff8fc8980453bf17acab"><code>01eeb2f</code></a>
[ty] Support frozen dataclasses (<a
href="https://redirect.github.com/astral-sh/ruff/issues/17974">#17974</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/cb04343b3b5e7a8a0841c73537733fa5aac482a2"><code>cb04343</code></a>
[ty] Split <code>invalid-base</code> error code into two error codes (<a
href="https://redirect.github.com/astral-sh/ruff/issues/18245">#18245</a>)</li>
<li><a
href="https://github.com/astral-sh/ruff/commit/02394b8049b52836ae7daca7132fab93031d1162"><code>02394b8</code></a>
[ty] Improve <code>invalid-type-form</code> diagnostic where a
module-literal type is us...</li>
<li>Additional commits viewable in <a
href="https://github.com/astral-sh/ruff/compare/0.11.10...0.11.11">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ruff&package-manager=pip&previous-version=0.11.10&new-version=0.11.11)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
### Description
* Add fpA intB gemm kernel from WeightOnlyGroupwiseQuantGemmPlugin of
TensorRT-LLM.
* Add prepacking to convert weight/scales/zero_points to adapt
MatMulNBits to use the kernel.

Limitations:
* Only enable fp16 kernel. BF16 support will be added later.
* Requires zero points. The support of scales only might be added later.
* Bias is not enabled since previous MatMulNBits kernel does not support
bias.

### Motivation and Context

To improve performance of LLM. 

Initial result shows 2.2x throughput on prompt processing and 1.25X
throughput on token generation using onnxruntime-genai benchmark_e2e.py
on phi-4-mini-instruct on A100.
### Description
<!-- Describe your changes. -->

- Fix onnxruntime-extensions include path.
- Add option to onnxruntime_perf_test to register custom ops from a
built-in onnxruntime-extensions.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fix build.py `--use_extensions` option.
Make it simple to use the built-in onnxruntime-extensions with
onnxruntime_perf_test.
…t#24524)

### Description
This change introduces `TPAUSE` support in the `SpinPause()` function in
Windows and Linux to reduce power consumption and improve efficiency
during spin-wait periods. `TPAUSE` is a lightweight power/performance
ISA that goes into an optimized C0 power state while waiting on a delay
event, compared to `_mm_pause()` which is a NOP-like instruction that
provides a small delay in the CPU Pipeline. With this change,
performance of First Inference Latency across certain models can also
improve. Models that were tested internally have shown up to ~2x
improvement in First Inference Latency and up to ~20% lower overall
power consumption.

Genuine Intel CPUID detection logic was also refactored into a shared
utility (`CheckIntel()`), enabling consistent platform checks across the
codebase. Here `TPAUSE` is enabled by default for architectures that
support it.

[Intel Intrinsics Guide
(TPAUSE)](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=tpause&techs=MMX,SSE_ALL,AVX_ALL,AVX_512,AMX,SVML,Other&ig_expand=6888,6888)

### Motivation and Context
Performance and power efficiency gains - Previous PR was created which
initially introduced the TPAUSE instruction in `SpinPause()` with
measured improvements in power (please see previous TPAUSE PR here: [Add
WAITPKG checks, add support for TPAUSE in ThreadPool spin
microsoft#16935](microsoft#16935)).
Additional performance testing and measurements were done across Mobile,
Desktop, and Server, influencing enhancements to the PR such as a tweak
to the `spin_delay_cycles`, Linux support and the refactored Intel CPUID
detection logic.
…re (microsoft#24910)

### Description
Recent changes in abseil-cpp.cmake is enabling ABSL_ENABLE_INSTALL which
is causing compilation error for AIX.
But the same was working before, so blocking this enablement.

```
[ 83%] Linking CXX executable onnxruntime_perf_test
ld: 0706-006 Cannot find or open library file: -l absl_failure_signal_handler
	ld:open(): A file or directory in the path name does not exist.
ld: 0706-006 Cannot find or open library file: -l absl_examine_stack
	ld:open(): A file or directory in the path name does not exist.
ld: 0706-006 Cannot find or open library file: -l absl_flags_parse
	ld:open(): A file or directory in the path name does not exist.
ld: 0706-006 Cannot find or open library file: -l absl_flags_usage
	ld:open(): A file or directory in the path name does not exist.
ld: 0706-006 Cannot find or open library file: -l absl_flags_usage_internal
	ld:open(): A file or directory in the path name does not exist.
.ibm-clang: error: linker command failed with exit code 255 (use -v to see invocation)
```


### Motivation and Context
To fix the compilation error, blocking the enablement of
ABSL_ENABLE_INSTALL under AIX.
### Description

This PR updates the attention fusions for Whisper to work with the
latest `transformers` package (`4.52.3`).

### Motivation and Context

Previously, the attention fusions were maintained for many older
`transformers` versions. The existing fusions do not work with the
latest `transformers` versions.
BF16 support is primarily available on NVIDIA GPUs with the Ampere and
later architectures with compute capability of 8.0 or higher.
If trt_bf16_enable = true and compute capability < 8, TRT EP will make
trt_bf16_enable = false
@jatinwadhwa921 jatinwadhwa921 requested a review from ankitm3k June 2, 2025 04:59
@ankitm3k ankitm3k merged commit be8fded into ovep-develop Jun 2, 2025
4 of 7 checks passed
@ankitm3k ankitm3k deleted the sync_msft_2_6_25 branch June 2, 2025 06:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.