Backmerging with Msft commits #699

jatinwadhwa921 · 2025-06-02T04:59:03Z

Backmerging with Msft commits

### Description  enable f16 on vulkan/nvidia GPUs ### Motivation and Context

### Description This commit adds a telemetry field to indicate if a debugger is attached to the process. ### Motivation and Context This is useful for ignoring events coming from processes being debugged.

### Description  mark Linux x64 supports webgpu ### Motivation and Context

…24763) ### Description  Add '--enable_generic_interface' build flag to the node package for Windows (both x64 and arm64) builds. ### Motivation and Context

### Description Disable a test for QNN to unblock the build pipeline. Should be caused by a combination of PR changes.

### Description Upgrade cutlass to 3.9.2 ### Motivation and Context To work on new features.

…24793) ### Description Changes the namespace declaration from ```C# namespace Microsoft.ML.OnnxRuntime.CompileApi; // Code ``` to ```C# namespace Microsoft.ML.OnnxRuntime.CompileApi { // Code } ``` ### Motivation and Context file-scoped namespaces are not supported in C# 8.0, which results in an error in our documentation publishing: https://github.com/microsoft/onnxruntime/actions/workflows/publish-csharp-apidocs.yml

…crosoft#24802) ### Description  Currently some required ADO pipeline fails because of version mismatch between vcpkg build and non vcpkg build. This PR fixes the failed builds. ### Motivation and Context

### Description Validate ep.context_file_path option, make sure it failed if it's not valid file path

### Description  1. re-enable wasm CPU tests. It was originally enabled but was later disabled in a change that treat wasm build as cross-compiling. 2. Use build.py to populate the environment variables. ### Motivation and Context

WebNN doesn't provide a dedicated op for `ConvInteger`, this PR supports `ConvInteger` op by decomposing it into `DequantizeLinear x, w -> Conv -> Cast (to int32)`. BTW, adds `ConvInteger` to layout sensitive op list for layout transformation when the preferred layout is NHWC.

### Description Enable vcpkg for webgpu ### Motivation and Context

### Description  upgrade emsdk to 4.0.8 ### Motivation and Context

…el` flow (microsoft#24799) ### Description convert all session configs, i.e. key-value pairs into provider options, the key prefixed with `ort_session_config.` ### Motivation and Context microsoft#24445 has a bug when `Ort::CompileModel` is used, not all session config are passed to VITISAI EP backend. It is because that the `session_option` which holds a reference to `VitisiAIExectuionProviderFactory` is not as same as the `session_option` used for `Ort::CompileModel`. `Ort::CompileModel` create another `session_option` behind scene. The symptom of this bug is that only the session configs in the first `SessionOptions` object is passed to `VitisiAIExectuionProviderFactory` and session configs in the second `SessionOptions` are not, so that VITISAI EP backend sometimes assumes that ep.cache_context is not enabled, and then ep context cache model is not created properly.

…rosoft#24772) ### Description  SkipSimplifiedLayerNorm + QuickGelu bfloat16 CUDA implementation microsoft#24772 ### Motivation and Context

### Description  ### Motivation and Context

…e compiled (microsoft#24695) ### Description #### Original compile approach where an EPContext model is generated as a side-effect of creating a session: - **Restore** previous behavior where: - compiling a model that generates no EPContext nodes is silently ignored (nothing is generated and no error is reported) - compiling a previously compiled model is silently ignored (nothing is generated and no error is reported) #### Explicit compile API: - **Retains** current behavior where compiling a model that does not generate EPContext nodes still generates a model by default. - Adds C/C++/C#/Python API called `setFlags` that allows the user to specify what is considered an error. - `OrtCompileApiFlags_ERROR_IF_NO_NODES_COMPILED`: CompileModel() returns `ORT_FAIL` if no nodes were compiled. - `OrtCompileApiFlags_ERROR_IF_OUTPUT_FILE_EXISTS`: CompileModel() returns `ORT_FAIL` if a file with the same filename as the output model exists. - Adds logic to detect when the user is trying to compile a previously compiled model and returns an `ORT_INVALID_GRAPH` error with a relevant error message. ### Motivation and Context A previous [PR changed the default behavior](microsoft@b4f7a90#diff-e2d3910ae7593ee7ba4fd74e53f738fa973ae2fc32c069f1088ba458b91f8d4bL809) of the original "implicit" compilation approach. This PR was motivated by restoring the original behavior that users currently depend on. At the same time, we want to allow users of the new explicit compile API to determine what is considered an error.

### Description support of dumping tensors of int8, uin t8, BFloat16, UInt4x2, and Int4x2 data types in the tensor dumper. ### Motivation and Context Help debugging of operators using these data types.

### Description This change adds support for serializing the QNN graph to the new Deep Learning Container (DLC) format. It is meant to supplement and perhaps eventually replace use of the QnnSaver backend, which emits C++ source files when `qnn_saver_path` is set. * Add support for serializing to .dlc via the QnnIr backend. * Don't silently fallback to QnnCpu when QnnSaver was explicitly selected as the execution backend. * Minor fixes. ### Motivation and Context QNN model libraries, produced by compiling the C++ files that may be produced by QnnSaver have a number of drawbacks. Most importantly, they are not cross-platform and cannot be visualized via Netron or other tools. For these reasons, we anticipate that they may eventually be deprecated in favor of DLC files. These containers typically include a platform-agnostic representation of the graph QNN's internal representation. --------- Co-authored-by: Jeff Kilpatrick <jkilpat@qti.qualcomm.com>

### Description  Fix Node.js linux/x64 cuda12 installation. ### Motivation and Context

### Description  Fixed a TRT context memory sharing bug where the context memory was assigned to a unique_ptr that was immediately destructed upon leaving scope. ### Motivation and Context  The bug seems to be introduced by a refactor work: microsoft#15833 : ![image](https://github.com/user-attachments/assets/eec0e363-b6b1-4831-9ee4-a1b3ed45116c)

### Description Add LSTM support for QNN EP ### Motivation and Context Add LSTM support for QNN EP

- QNN's 16x16 Conv doesn't support asymmetric int16 weight - Insert Convert Op to convert from asymmetric uint16 weight to symmetric int16 weight ### Description - QNN' Conv op doesn't support asymmetric INT16 weights. - 16x16 Conv operators in ONNX models fallback to CPU execution provider and reporting higher inference times. - Insert a Convert Op to convert asymmetric uint16 weight to symmetric int16 weight to schedule 16x16 Conv's on QNN EP provider. ### Motivation and Context - This fixes Graph execution failures for models contain 16x16 Conv op on QNN Execution provider - This also improves Inference times of model contain 16x16 Conv op

### Description Remove unused tensor dumper functions. Those functions are not needed any more since it is easy to make a string with `::onnxruntime::MakeString` like in `DUMP_CPU_STRING` macros. ### Motivation and Context Follow up with microsoft#24813 (comment). Some functions were added, but not used any more. Remove them to avoid maintenance cost.

### Description - Match the graph input correctly - Add GetGraphInputNumber function ### Motivation and Context - The number of graph inputs and the number of tensor wrappers may not match. - For example, for ResizeNearestNeighbor op, Qnn only cares about the 1st input, so the rest of the inputs are not converted to tensor wrappers. However, these remaining inputs still appear in the graph inputs, resulting in a discrepancy in the input quantities.

@ankan-ban

### Description  Small change to remove the MS Domain check on onnx model nodes ### Motivation and Context  The check returns unsupported for some nodes having an MS Domain. Trt RTX supports some MS domain ops. if return unsupported these ops falls back to CPU EP @ankan-ban @chilo-ms @gedoensmax @jywu-msft Co-authored-by: iraut <iraut@nvidia.com>

- Previously, padding for rank-3 MaxPool was only computed for auto_pad="NOTSET", using the final output shape. - Identified a broader issue during auto_pad="VALID" implementation: padding must be derived from the recalculated output shape. - Added unit tests to cover all use cases of auto_pad. - Enabled the failing unit test in the cpu pool test ### Description This PR fixes an issue in the padding calculation logic for rank-3 MaxPool operations when using auto_pad. The bug stemmed from using the final output shape (rank-3) to compute padding, rather than the correct intermediate shape (rank-4) that MaxPool actually operates on. The logic has been updated to use the reshaped rank-4 output for accurate padding computation. Unit tests have been added to validate behavior across all auto_pad modes. ### Motivation and Context While implementing support for auto_pad="VALID" in MaxPool, we discovered that the padding for MaxPool rank-3 was being calculated using the final output shape, which is rank-3. However, MaxPool internally operates on a reshaped rank-4 tensor (via pre- and post-processing reshapes). As a result, the padding logic was misaligned with the actual shape used during pooling, leading to test failures.

### Description Update Qnn default version to 2.34.0.250424

### Description Major changes of spec: * 2D scale shape: [N * n_blocks_per_col] => [N, n_blocks_per_col] * 2D zero shape: [N * CeilDiv(n_blocks_per_col * bits, 8)] => [N, CeilDiv(n_blocks_per_col * bits, 8)] * For B, drop int32 type and only allow uint8. * allow bfloat16 as input/output type. * Mark input g_idx as deprecated (since it has no benefit on model size and performance in inference). Add a function CheckInputs to verify the input shape. The reason of the shape change is to make scale and zero compatible with other operators like DequantizeLinear and GatherBlockQuantized. That will make it easy for graph fusion and model builder. Note that ORT can still handle the legacy 1D format for scale and zero points, and CUDA/CPU could still handle g_idx. However, they are deprecated, and our tools shall generate 2D scale and zeros, and avoid using g_idx going forward. This change is backward compatible. Model from old spec can run in latest ORT (CheckInputs handles 1D scale and zero points), and model from latest spec can still run in older ORT (since older ORT does not check dimension of scale and zero points) ### Motivation and Context CUDA and CPU provider does not check inputs for MatMulNBits. It could cause out of boundary access. We are going to share the lm_head weights of MatMulNBits to GatherBlockQuantized. 2D shape can be used in Gather directly, and we can avoid Reshape nodes. Our latest models published for foundry use 2D scale and zero points. So I update the spec to reflect that.

### Description Resolves the following issues starting in TensorRT 10.11: - Version macros changed in `NvInferVersion.h`, update build to look for new macros - Updated deprecated APIs (setShapeValues -> setShapeValuesV2() to support INT64 shape values) ### Motivation and Context - Resolves building TensorRT EP from source with latest 10.11 release. Signed-off-by: Kevin Chen <kevinch@nvidia.com>

### Description TRT supports Bfloat 16 and ORT does as well. In addition the `setup.py` was missing a copy for NVTRT EP and TRT EP can only be built against the packaged parser with TRT RTX.

### Description - Add support for ScatterND reduction attribute - Gracefully handle the unsupported reduction values - Add unit tests to validate Reduction attribute support ### Motivation and Context

@Honry

…and WebNN (microsoft#24830) ### Description Add `map_info.h` to centralize the operation types and inputs mapping between onnx and webnn. ### Motivation and Context To simplify the maintenance of operation types and inputs. The mapping of onnx input names and webnn input names will be used in the future to check the `rankRange`. @Honry, @fdwr, @guschmue, PTAL, thanks! --------- Co-authored-by: Wanming Lin <wanming.lin@intel.com>

### Description  Currently, the XCode build with nodejs binding(`--use_xcode`) always fails on Mac. ``` ./build.sh --config Debug --use_xcode --use_webgpu --build_shared_lib --build_nodejs --parallel --compile_no_warning_as_error --skip_submodule_sync --cmake_extra_defines CMAKE_OSX_ARCHITECTURES=arm64 --skip_tests ``` The root cause is that the dylib locates on `/Debug/Debug` not `/Debug` with using XCode generator. For other generator(e.g. make, ninja), the dylib locates on `/Debug` as expected. Mac pipeline can pass because they didn't use XCode generator. <img width="913" alt="image" src="https://github.com/user-attachments/assets/e1203fdb-d88a-4c06-abad-b641d502237c" />

…/docker/scripts (microsoft#24810) Bumps [setuptools](https://github.com/pypa/setuptools) from 69.0.3 to 78.1.1. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/pypa/setuptools/blob/main/NEWS.rst">setuptools's changelog</a>.</em></p> <blockquote> <h1>v78.1.1</h1> <h2>Bugfixes</h2> <ul> <li>More fully sanitized the filename in PackageIndex._download. (<a href="https://redirect.github.com/pypa/setuptools/issues/4946">#4946</a>)</li> </ul> <h1>v78.1.0</h1> <h2>Features</h2> <ul> <li>Restore access to _get_vc_env with a warning. (<a href="https://redirect.github.com/pypa/setuptools/issues/4874">#4874</a>)</li> </ul> <h1>v78.0.2</h1> <h2>Bugfixes</h2> <ul> <li>Postponed removals of deprecated dash-separated and uppercase fields in <code>setup.cfg</code>. All packages with deprecated configurations are advised to move before 2026. (<a href="https://redirect.github.com/pypa/setuptools/issues/4911">#4911</a>)</li> </ul> <h1>v78.0.1</h1> <h2>Misc</h2> <ul> <li><a href="https://redirect.github.com/pypa/setuptools/issues/4909">#4909</a></li> </ul> <h1>v78.0.0</h1> <h2>Bugfixes</h2> <ul> <li>Reverted distutils changes that broke the monkey patching of command classes. (<a href="https://redirect.github.com/pypa/setuptools/issues/4902">#4902</a>)</li> </ul> <h2>Deprecations and Removals</h2> <ul> <li>Setuptools no longer accepts options containing uppercase or dash characters in <code>setup.cfg</code>.</li> </ul>  </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/pypa/setuptools/commit/8e4868a036b7fae3208d16cb4e5fe6d63c3752df"><code>8e4868a</code></a> Bump version: 78.1.0 → 78.1.1</li> <li><a href="https://github.com/pypa/setuptools/commit/100e9a61ad24d5a147ada57357425a8d40626d09"><code>100e9a6</code></a> Merge pull request <a href="https://redirect.github.com/pypa/setuptools/issues/4951">#4951</a></li> <li><a href="https://github.com/pypa/setuptools/commit/8faf1d7e0ca309983252e4f21837b73ee12e960f"><code>8faf1d7</code></a> Add news fragment.</li> <li><a href="https://github.com/pypa/setuptools/commit/2ca4a9fe4758fcd39d771d3d3a5b4840aacebdf7"><code>2ca4a9f</code></a> Rely on re.sub to perform the decision in one expression.</li> <li><a href="https://github.com/pypa/setuptools/commit/e409e8002932f2b86aae7b1abc8f8c2ebf96df2c"><code>e409e80</code></a> Extract _sanitize method for sanitizing the filename.</li> <li><a href="https://github.com/pypa/setuptools/commit/250a6d17978f9f6ac3ac887091f2d32886fbbb0b"><code>250a6d1</code></a> Add a check to ensure the name resolves relative to the tmpdir.</li> <li><a href="https://github.com/pypa/setuptools/commit/d8390feaa99091d1ba9626bec0e4ba7072fc507a"><code>d8390fe</code></a> Extract _resolve_download_filename with test.</li> <li><a href="https://github.com/pypa/setuptools/commit/4e1e89392de5cb405e7844cdc8b20fc2755dbaba"><code>4e1e893</code></a> Merge <a href="https://github.com/jaraco/skeleton">https://github.com/jaraco/skeleton</a></li> <li><a href="https://github.com/pypa/setuptools/commit/3a3144f0d2887fa37c06550f42a101e9eebd953a"><code>3a3144f</code></a> Fix typo: <code>pyproject.license</code> -> <code>project.license</code> (<a href="https://redirect.github.com/pypa/setuptools/issues/4931">#4931</a>)</li> <li><a href="https://github.com/pypa/setuptools/commit/d751068fd2627d6d8f1729e39cbcd8119049998f"><code>d751068</code></a> Fix typo: pyproject.license -> project.license</li> <li>Additional commits viewable in <a href="https://github.com/pypa/setuptools/compare/v69.0.3...v78.1.1">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=setuptools&package-manager=pip&previous-version=69.0.3&new-version=78.1.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

… session run (microsoft#24672) ### Description Add option to enable tensor input and output bindings on CUDA before perftest inference session run. Output binding is handled by changing the memory allocator type to CUDA. Input binding is handled by creating default ORT tensor on CPU, initializing it with data, then cudaMemcpy the data from CPU to CUDA allocated GPU tensor using the raw pointers. ### Motivation and Context By this change, the end-to-end inference time reported is more accurate as the CPU<->GPU overhead is moved out of the inference run

### Description  - Add SpaceToDepth fusion for QNN preprocess. - The pattern in YOLOv2 is uncommon while the common seen one is left as future work. - Add entry point/API for non-quantization user to preprocess models for QNN execution. - Revise cmake to package newly introduced directory into Python wheel. ### Motivation and Context  - While executing YOLOv2 model on QNN-EP, a sequence of Reshape and Transpose having 6D shapes are falling back to CPU due to HTP limitation. Add fusion to fuse this sequence of ops into a single SpaceToDepth which can be directly executed on QNN-EP. - Since current QNN preprocess is provided in `onnxruntime/python/tools/quantization/execution_providers/qnn/preprocess.py` which is under quantization directory, the path may be confusing for non-quantization users. In order to allow non-quantization users to preprocess models for QNN, introduce `onnxruntime/python/tools/qnn/preprocess.py` to serve as the entry point and provide API to preprocess models.

QNN [Softmax op defines pre-scale (`beta`)](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/MasterOpDef.html#softmax) that we can fold constant scalar multiply into it.

Windows on ARM support AMD64 emulation, so we can use win64 version of protoc. Description Compilation on ARM64 machine fails due to missing protoc dependency. Motivation and Context With this change we can compile onnxruntime on Windows on Arm devices without setting protobuf manually. CMake will download and setup protoc dependency.

VCPKG has removed this feature.

**Description:** This pull request refactors the symbol publishing workflow that uses the internal REST API. It addresses a breaking change introduced by the `Az.Accounts` module update (v5.0.1+) where `Get-AzAccessToken` now returns a `SecureString`. Additionally, it improves the structure and robustness of the custom symbol publishing steps. **Problem:** 1. The pipeline recently stopped working due to an update in the Azure PowerShell `Az.Accounts` module. The `Get-AzAccessToken` cmdlet now returns a `SecureString` by default, which was incompatible with the previous script that expected a plain string token when setting a pipeline variable. 2. The previous implementation used two separate tasks: one `AzurePowerShell@5` task to generate the token and set it as a pipeline variable, and a subsequent `pwsh` task to consume this variable and make REST API calls. This separation required converting the `SecureString` to plain text before setting the pipeline variable. **Solution:** To address these issues and improve the pipeline's design: 1. The "Generate an Azure Token" (`AzurePowerShell@5`) task and the "Publish Symbols using internal REST API" (`pwsh`) task have been **combined into a single `AzurePowerShell@5` task.** 2. Within this unified task: * `Get-AzAccessToken` is called, and its `SecureString` output is stored in a local PowerShell variable. * The `SecureString` token is converted to plain text *only within the scope of this script* and immediately before it's used in the `Authorization` header for `Invoke-RestMethod` calls. * The token is no longer passed between tasks via a pipeline variable, enhancing security by limiting the scope of the plain text token. **Key Changes:** * **Enhanced `SecureString` Management:** The token remains a `SecureString` for most of its lifetime within the script, reducing exposure. * **Improved Error Handling:** `try-catch` blocks have been added around the token retrieval and `Invoke-RestMethod` calls for better error reporting and pipeline stability. * **Robust Parameter Handling:** Explicit conversion for boolean parameters (e.g., `includePublicSymbolServer`) to ensure correct PowerShell boolean types before JSON serialization.

The following unit tests failed when building ONNX Runtime with Visual Studio 17.14 in Release or RelWithDebInfo configuration. - SparseTensorConversionTests.TestDenseToSparseConversion - MeanVarianceNormalizationTest.AllAxes - MVNContribOpTest.MeanVarianceNormalizationCPUTest_Version1_TO_8 This PR provides a workaround for the two MVN tests.

…#24884) Fix for microsoft#24861 --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>

### Description  revise WASM CI to run test as later step than publishing artifacts. This allows download the binary to diagnose test failures. ### Motivation and Context

Handle NaN in softmax operator for WebGPU EP and JSEP. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

…into `onnxruntime-web/wasm` build (microsoft#24836) ### Description Fixes inference error from `ort-wasm-simd-threaded.mjs` not being bundled into `ort.wasm.bundle.min.mjs` as it is for other `bundle.min.mjs` builds. ### Motivation and Context To decrease my app's bundle size, I followed the [conditional importing guide](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/js/importing_onnxruntime-web#conditional-importing) and imported the WASM-only build: ```diff - import * as ort from 'onnxruntime-web'; + import * as ort from 'onnxruntime-web/wasm'; ``` After this change, creating an inference session would result in: `TypeError: Failed to resolve module specifier './ort-wasm-simd-threaded.mjs'`. This was because `ort-wasm-simd-threaded.mjs` was not bundled into the build at `onnxruntime-web/wasm`, which points to `ort.wasm.bundle.min.mjs`, despite how its name suggests. In other builds with `bundle` in their name, the module is bundled, yet it was not done so in the WASM one. This PR bundles the Javascript WASM runtime in to match the other builds, fixing the error.

Bumps [ruff](https://github.com/astral-sh/ruff) from 0.11.10 to 0.11.11. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/ruff/releases">ruff's releases</a>.</em></p> <blockquote> <h2>0.11.11</h2> <h2>Release Notes</h2> <h3>Preview features</h3> <ul> <li>[<code>airflow</code>] Add autofixes for <code>AIR302</code> and <code>AIR312</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/17942">#17942</a>)</li> <li>[<code>airflow</code>] Move rules from <code>AIR312</code> to <code>AIR302</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/17940">#17940</a>)</li> <li>[<code>airflow</code>] Update <code>AIR301</code> and <code>AIR311</code> with the latest Airflow implementations (<a href="https://redirect.github.com/astral-sh/ruff/pull/17985">#17985</a>)</li> <li>[<code>flake8-simplify</code>] Enable fix in preview mode (<code>SIM117</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/18208">#18208</a>)</li> </ul> <h3>Bug fixes</h3> <ul> <li>Fix inconsistent formatting of match-case on <code>[]</code> and <code>_</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/18147">#18147</a>)</li> <li>[<code>pylint</code>] Fix <code>PLW1514</code> not recognizing the <code>encoding</code> positional argument of <code>codecs.open</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/18109">#18109</a>)</li> </ul> <h3>CLI</h3> <ul> <li>Add full option name in formatter warning (<a href="https://redirect.github.com/astral-sh/ruff/pull/18217">#18217</a>)</li> </ul> <h3>Documentation</h3> <ul> <li>Fix rendering of admonition in docs (<a href="https://redirect.github.com/astral-sh/ruff/pull/18163">#18163</a>)</li> <li>[<code>flake8-print</code>] Improve print/pprint docs for <code>T201</code> and <code>T203</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/18130">#18130</a>)</li> <li>[<code>flake8-simplify</code>] Add fix safety section (<code>SIM110</code>,<code>SIM210</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/18114">#18114</a>,<a href="https://redirect.github.com/astral-sh/ruff/pull/18100">#18100</a>)</li> <li>[<code>pylint</code>] Fix docs example that produced different output (<code>PLW0603</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/18216">#18216</a>)</li> </ul> <h2>Contributors</h2> <ul> <li><a href="https://github.com/AlexWaygood"><code>@AlexWaygood</code></a></li> <li><a href="https://github.com/BradonZhang"><code>@BradonZhang</code></a></li> <li><a href="https://github.com/BurntSushi"><code>@BurntSushi</code></a></li> <li><a href="https://github.com/CodeMan62"><code>@CodeMan62</code></a></li> <li><a href="https://github.com/InSyncWithFoo"><code>@InSyncWithFoo</code></a></li> <li><a href="https://github.com/LaBatata101"><code>@LaBatata101</code></a></li> <li><a href="https://github.com/Lee-W"><code>@Lee-W</code></a></li> <li><a href="https://github.com/Mathemmagician"><code>@Mathemmagician</code></a></li> <li><a href="https://github.com/MatthewMckee4"><code>@MatthewMckee4</code></a></li> <li><a href="https://github.com/MichaReiser"><code>@MichaReiser</code></a></li> <li><a href="https://github.com/TomerBin"><code>@TomerBin</code></a></li> <li><a href="https://github.com/VascoSch92"><code>@VascoSch92</code></a></li> <li><a href="https://github.com/adamaaronson"><code>@adamaaronson</code></a></li> <li><a href="https://github.com/brainwane"><code>@brainwane</code></a></li> <li><a href="https://github.com/brandtbucher"><code>@brandtbucher</code></a></li> <li><a href="https://github.com/carljm"><code>@carljm</code></a></li> <li><a href="https://github.com/dcreager"><code>@dcreager</code></a></li> <li><a href="https://github.com/dhruvmanila"><code>@dhruvmanila</code></a></li> <li><a href="https://github.com/dragon-dxw"><code>@dragon-dxw</code></a></li> <li><a href="https://github.com/felixscherz"><code>@felixscherz</code></a></li> <li><a href="https://github.com/kiran-4444"><code>@kiran-4444</code></a></li> <li><a href="https://github.com/maxmynter"><code>@maxmynter</code></a></li> <li><a href="https://github.com/ntBre"><code>@ntBre</code></a></li> </ul>  </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md">ruff's changelog</a>.</em></p> <blockquote> <h2>0.11.11</h2> <h3>Preview features</h3> <ul> <li>[<code>airflow</code>] Add autofixes for <code>AIR302</code> and <code>AIR312</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/17942">#17942</a>)</li> <li>[<code>airflow</code>] Move rules from <code>AIR312</code> to <code>AIR302</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/17940">#17940</a>)</li> <li>[<code>airflow</code>] Update <code>AIR301</code> and <code>AIR311</code> with the latest Airflow implementations (<a href="https://redirect.github.com/astral-sh/ruff/pull/17985">#17985</a>)</li> <li>[<code>flake8-simplify</code>] Enable fix in preview mode (<code>SIM117</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/18208">#18208</a>)</li> </ul> <h3>Bug fixes</h3> <ul> <li>Fix inconsistent formatting of match-case on <code>[]</code> and <code>_</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/18147">#18147</a>)</li> <li>[<code>pylint</code>] Fix <code>PLW1514</code> not recognizing the <code>encoding</code> positional argument of <code>codecs.open</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/18109">#18109</a>)</li> </ul> <h3>CLI</h3> <ul> <li>Add full option name in formatter warning (<a href="https://redirect.github.com/astral-sh/ruff/pull/18217">#18217</a>)</li> </ul> <h3>Documentation</h3> <ul> <li>Fix rendering of admonition in docs (<a href="https://redirect.github.com/astral-sh/ruff/pull/18163">#18163</a>)</li> <li>[<code>flake8-print</code>] Improve print/pprint docs for <code>T201</code> and <code>T203</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/18130">#18130</a>)</li> <li>[<code>flake8-simplify</code>] Add fix safety section (<code>SIM110</code>,<code>SIM210</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/18114">#18114</a>,<a href="https://redirect.github.com/astral-sh/ruff/pull/18100">#18100</a>)</li> <li>[<code>pylint</code>] Fix docs example that produced different output (<code>PLW0603</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/18216">#18216</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/astral-sh/ruff/commit/0397682f1f50c9c1cc29293ac870f7720b0eda33"><code>0397682</code></a> Bump 0.11.11 (<a href="https://redirect.github.com/astral-sh/ruff/issues/18259">#18259</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/bcefa459f4069970bc9776575e2f776b8d130dc9"><code>bcefa45</code></a> [ty] Rename <code>call-possibly-unbound-method</code> to `possibly-unbound-implicit-call...</li> <li><a href="https://github.com/astral-sh/ruff/commit/91b7a570c2bd1c9e1cab894ded866e885f28946a"><code>91b7a57</code></a> [ty] Implement Python's floor division semantics for <code>Literal</code> <code>int</code>s (<a href="https://redirect.github.com/astral-sh/ruff/issues/18249">#18249</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/98da200d45b040401bc5c1ff04fd678d37d3dd3e"><code>98da200</code></a> [ty] Fix server panic when calling <code>system_mut</code> (<a href="https://redirect.github.com/astral-sh/ruff/issues/18252">#18252</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/029085fa7239eb25f9b60b6aff56aa54945becaf"><code>029085f</code></a> [ty] Clarify <code>ty check</code> output default in documentation. (<a href="https://redirect.github.com/astral-sh/ruff/issues/18246">#18246</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/6df10c638e3afed4a3fd9145d0353861e29d6acc"><code>6df10c6</code></a> [<code>pylint</code>] Fix docs example that produced different output (<code>PLW0603</code>) (<a href="https://redirect.github.com/astral-sh/ruff/issues/18216">#18216</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/bdf488462a5a5c0d4e104eff4e299829019b657d"><code>bdf4884</code></a> Preserve tuple parentheses in case patterns (<a href="https://redirect.github.com/astral-sh/ruff/issues/18147">#18147</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/01eeb2f0d6894f413048ff8fc8980453bf17acab"><code>01eeb2f</code></a> [ty] Support frozen dataclasses (<a href="https://redirect.github.com/astral-sh/ruff/issues/17974">#17974</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/cb04343b3b5e7a8a0841c73537733fa5aac482a2"><code>cb04343</code></a> [ty] Split <code>invalid-base</code> error code into two error codes (<a href="https://redirect.github.com/astral-sh/ruff/issues/18245">#18245</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/02394b8049b52836ae7daca7132fab93031d1162"><code>02394b8</code></a> [ty] Improve <code>invalid-type-form</code> diagnostic where a module-literal type is us...</li> <li>Additional commits viewable in <a href="https://github.com/astral-sh/ruff/compare/0.11.10...0.11.11">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ruff&package-manager=pip&previous-version=0.11.10&new-version=0.11.11)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

### Description * Add fpA intB gemm kernel from WeightOnlyGroupwiseQuantGemmPlugin of TensorRT-LLM. * Add prepacking to convert weight/scales/zero_points to adapt MatMulNBits to use the kernel. Limitations: * Only enable fp16 kernel. BF16 support will be added later. * Requires zero points. The support of scales only might be added later. * Bias is not enabled since previous MatMulNBits kernel does not support bias. ### Motivation and Context To improve performance of LLM. Initial result shows 2.2x throughput on prompt processing and 1.25X throughput on token generation using onnxruntime-genai benchmark_e2e.py on phi-4-mini-instruct on A100.

### Description  - Fix onnxruntime-extensions include path. - Add option to onnxruntime_perf_test to register custom ops from a built-in onnxruntime-extensions. ### Motivation and Context  Fix build.py `--use_extensions` option. Make it simple to use the built-in onnxruntime-extensions with onnxruntime_perf_test.

…t#24524) ### Description This change introduces `TPAUSE` support in the `SpinPause()` function in Windows and Linux to reduce power consumption and improve efficiency during spin-wait periods. `TPAUSE` is a lightweight power/performance ISA that goes into an optimized C0 power state while waiting on a delay event, compared to `_mm_pause()` which is a NOP-like instruction that provides a small delay in the CPU Pipeline. With this change, performance of First Inference Latency across certain models can also improve. Models that were tested internally have shown up to ~2x improvement in First Inference Latency and up to ~20% lower overall power consumption. Genuine Intel CPUID detection logic was also refactored into a shared utility (`CheckIntel()`), enabling consistent platform checks across the codebase. Here `TPAUSE` is enabled by default for architectures that support it. [Intel Intrinsics Guide (TPAUSE)](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=tpause&techs=MMX,SSE_ALL,AVX_ALL,AVX_512,AMX,SVML,Other&ig_expand=6888,6888) ### Motivation and Context Performance and power efficiency gains - Previous PR was created which initially introduced the TPAUSE instruction in `SpinPause()` with measured improvements in power (please see previous TPAUSE PR here: [Add WAITPKG checks, add support for TPAUSE in ThreadPool spin microsoft#16935](microsoft#16935)). Additional performance testing and measurements were done across Mobile, Desktop, and Server, influencing enhancements to the PR such as a tweak to the `spin_delay_cycles`, Linux support and the refactored Intel CPUID detection logic.

…re (microsoft#24910) ### Description Recent changes in abseil-cpp.cmake is enabling ABSL_ENABLE_INSTALL which is causing compilation error for AIX. But the same was working before, so blocking this enablement. ``` [ 83%] Linking CXX executable onnxruntime_perf_test ld: 0706-006 Cannot find or open library file: -l absl_failure_signal_handler ld:open(): A file or directory in the path name does not exist. ld: 0706-006 Cannot find or open library file: -l absl_examine_stack ld:open(): A file or directory in the path name does not exist. ld: 0706-006 Cannot find or open library file: -l absl_flags_parse ld:open(): A file or directory in the path name does not exist. ld: 0706-006 Cannot find or open library file: -l absl_flags_usage ld:open(): A file or directory in the path name does not exist. ld: 0706-006 Cannot find or open library file: -l absl_flags_usage_internal ld:open(): A file or directory in the path name does not exist. .ibm-clang: error: linker command failed with exit code 255 (use -v to see invocation) ``` ### Motivation and Context To fix the compilation error, blocking the enablement of ABSL_ENABLE_INSTALL under AIX.

### Description This PR updates the attention fusions for Whisper to work with the latest `transformers` package (`4.52.3`). ### Motivation and Context Previously, the attention fusions were maintained for many older `transformers` versions. The existing fusions do not work with the latest `transformers` versions.

BF16 support is primarily available on NVIDIA GPUs with the Ampere and later architectures with compute capability of 8.0 or higher. If trt_bf16_enable = true and compute capability < 8, TRT EP will make trt_bf16_enable = false

fs-eire and others added 30 commits May 16, 2025 09:07

Telemetry field to indicate debugger is attached (microsoft#24777)

a169366

### Description This commit adds a telemetry field to indicate if a debugger is attached to the process. ### Motivation and Context This is useful for ignoring events coming from processes being debugged.

Disable a test for QNN to unblock the build pipeline (microsoft#24791)

323b5f4

### Description Disable a test for QNN to unblock the build pipeline. Should be caused by a combination of PR changes.

Fix nightly packaging pipelines (microsoft#24789)

1025905

[CUDA] Upgrade cutlass to 3.9.2 (microsoft#24794)

8983424

### Description Upgrade cutlass to 3.9.2 ### Motivation and Context To work on new features.

Validate ep.context_file_path option (microsoft#24797)

1b5628a

### Description Validate ep.context_file_path option, make sure it failed if it's not valid file path

enable using vcpkg for Dawn (microsoft#24699)

5915bc8

### Description Enable vcpkg for webgpu ### Motivation and Context

upgrade emsdk to 4.0.8 (microsoft#24798)

b45c7b6

### Description  upgrade emsdk to 4.0.8 ### Motivation and Context

Bump ruff from 0.11.9 to 0.11.10 (microsoft#24804)

809d9a0

[vcpkg-ports] set EOL to LF for app .patch files (microsoft#24812)

64aa7f0

### Description  ### Motivation and Context

support more data types in tensor dumper (microsoft#24813)

ac0195b

### Description support of dumping tensors of int8, uin t8, BFloat16, UInt4x2, and Int4x2 data types in the tensor dumper. ### Motivation and Context Help debugging of operators using these data types.

[QNN EP] Add LSTM op builder for QNN EP (microsoft#24646)

2aa961e

### Description Add LSTM support for QNN EP ### Motivation and Context Add LSTM support for QNN EP

Update Qnn default version to 2.34.0.250424 (microsoft#24750)

cd2502a

### Description Update Qnn default version to 2.34.0.250424

tianleiwu and others added 26 commits May 23, 2025 12:26

[NvTensorRT RTX] Add Bfloat16 (microsoft#24743)

3a20910

### Description TRT supports Bfloat 16 and ORT does as well. In addition the `setup.py` was missing a copy for NVTRT EP and TRT EP can only be built against the packaged parser with TRT RTX.

[QNN EP] Fuse scale into softmax (microsoft#24809)

f9739c2

QNN [Softmax op defines pre-scale (`beta`)](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/MasterOpDef.html#softmax) that we can fold constant scalar multiply into it.

Disable VCPKG's binary cache (microsoft#24889)

b925dfd

VCPKG has removed this feature.

change dependency from gitlab eigen to github eigen-mirror (microsoft…

f57db79

…#24884) Fix for microsoft#24861 --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>

[WebGPU EP] Fix NaN bug in softmax operator (microsoft#24855)

d520798

Handle NaN in softmax operator for WebGPU EP and JSEP. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Merge branch 'master' into sync_msft_2_6_25

ad5413d

jatinwadhwa921 requested a review from ankitm3k June 2, 2025 04:59

ankitm3k approved these changes Jun 2, 2025

View reviewed changes

ankitm3k merged commit be8fded into ovep-develop Jun 2, 2025
4 of 7 checks passed

ankitm3k deleted the sync_msft_2_6_25 branch June 2, 2025 06:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Backmerging with Msft commits #699

Backmerging with Msft commits #699

Uh oh!

jatinwadhwa921 commented Jun 2, 2025

Uh oh!

Uh oh!

Uh oh!

Backmerging with Msft commits #699

Backmerging with Msft commits #699

Uh oh!

Conversation

jatinwadhwa921 commented Jun 2, 2025

Uh oh!

Uh oh!

Uh oh!