Skip to content

Conversation

mgleonard425
Copy link

Description

Allows users of Calibrators to specify which data types they want to be quantized. Today, it's hardcoded to FLOAT in the ORT code.

Motivation and Context

Enables sdk-cli to add TensorProto.INT64 as an extra dtype to capture into tranges. This, in turn, helps CGC properly legalize int64 edges to int32 if ranges fully fit within int32 in calibration, which is much better for our architecture

Lafi7e and others added 30 commits August 28, 2023 12:34
Cherry-pick 1st round for rel-1.16.0 from
https://github.com/microsoft/onnxruntime/issues?q=label%3Arelease%3A1.16+label%3Atriage%3Aapproved+is%3Aclosed
except #17201 because it caused UT failure and is not fixed yet.

PR list:
#16417
#16936
#17000
#17236
#17238
#17240
#17252
#17255
#17258
#17265
#17267
#17277
Cherry-pick 2nd round for 1.16.0 release.
PR List:

#17201
#17270
#17311
#17315
#17320
#17326
#17355
#17227
#17380
#17386
### Description
<!-- Describe your changes. -->

Use name of temporary provisioning profile.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

The old provisioning profile no longer works. Switched to a temporary
one that we can use before a new one is available. The temporary one has
a different name.

Alternative to #17454.
Disable QNN QDQ test for release branch

### Description
Disable QNN QDQ test for release branch to get rid of model test failure
caused by new model update in build image.
…(#17461)

### Description
Remove 52 from CMAKE_CUDA_ARCHITECTURES to reduce Nuget package size. 

### Motivation and Context
PR #17227 increased binary size by 20%. Right the package size is about
260MB. However, nuget has a hard limit of 250MB. Without this change we
cannot publish the package.
Cherry-pick #17507  for rel-1.16.0.

Note: The PR 17507 contains the part of engine decryption refactor that
we don't want to include it in ORT 1.16 release. This cherry pick PR
excludes this part.
### Description
1. Delete Prefast tasks (#17522)
2. Disable yum update (#17551)
3. Avoid calling patchelf (#17365 and #17562) we that we can validate
the above fix

The main problem I'm trying to solve is: our GPU package depends on both
CUDA 11.x and CUDA 12.x . However, it's not easy to see the information
because ldd doesn't work with the shared libraries we generate(see issue
#9754) . So the patchelf change are useful for me to validate the
"Disabling yum update" was successful. As you can see we call "yum
update" from multiple places. Without some kind of validation it's hard
to say if I have covered all of them.
The Prefast change is needed because I'm going to update the VM images
in the next a few weeks. In case of we need to publish a patch release
after that.

### Motivation and Context
Without this fix we will mix using CUDA 11.x and CUDA 12.x. And it will
crash every time when we use TensorRT.
Cherry-pick the following PRs to the release branch:

Fix: Fail to skip disabledmodel in winml (#17728) 
Move dotnet build and test into docker in Linux CPU CI (#17417) 
Run Nuget_Test_Linux_GPU in container (#17452) 
Run Final_Jar_Testing_Linux_GPU in docker (#17533) 
TreeEnsemble speed up (#17449) 
Remove onnxruntime extensions from list of gitmodules (#17615) 
Include onnxruntime_float16.h in the package. (#17637) 
Fix static quantization for QDQ and Percentile distribution (#17649) 
[TensorRT EP] Back out the PerThreadContext (#17690) 
Update nodejs to 18.x (#17657) 
Update linux-wasm-ci.yml: remove the ln command (#17735)
Remove the condition to allow an empty provide list.

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
### Description
fix session option access in Node.js binding


### Motivation and Context
This is a bug that affect transformer.js using ONNX Runtime Node.js
binding. Issue: #17377

This bug is already fixed in main branch, but it is not picked in 1.16
release.
### Description

Python package pipeline fails due to "tokenizers" compilation. Since
"tokenizers" is a dep of "transformers", we update its version and hope
a new solution had been there.

```
error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell`
--> tokenizers-lib/src/models/bpe/trainer.rs:517:47
```



### Motivation and Context
Cherry-pick from #17823
1. Increase version number for preparing the 1.16.2 release (#18070)
2. cherry-pick 18034
Those masks are used for MHA in LLaMA.
@fdwr This is the part 2 of the pybind work that was started earlier.
This adds the following features to the python IO binding
implementation:

- Use a bucketized allocator in order to reduce the number of resource
allocations
- Implement the following functions: `ortvalue_from_numpy`,
`update_inplace`, `ortvalue_from_shape_and_type` and `numpy`
- Modify the `onnxruntime_test_python_iobinding` tests to also run on
DML

Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com>
…4)" (#18150) (#18170)

This reverts commit 99b8dca.


### Motivation and Context
Restore the dml stage in windows GPU  pipeline.
Agent issue is solved by adding Feature.DisableGpuDriver in pool
properties.

---------

Co-authored-by: Yi Zhang <zhanyi@microsoft.com>
It is related to #18155 .

The issue has been fixed in the main branch by @jchen351
Cherry-pick changes related to LLaMA and StableDiffusion XL to 1.16.2 release branch.

### Motivation and Context
---------

Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
Co-authored-by: Patrice Vignola <vignola.patrice@gmail.com>
Co-authored-by: petermcaughan <peter.mcaughan@gmail.com>
Co-authored-by: Peter McAughan <petermca@microsoft.com>
Co-authored-by: Jambay Kinley <jambaykinley@microsoft.com>
Co-authored-by: PeixuanZuo <94887879+PeixuanZuo@users.noreply.github.com>
Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: aciddelgado <139922440+aciddelgado@users.noreply.github.com>
Co-authored-by: tlwu@microsoft.com <tlwu@a100.crj0ad2y1kku1j4yxl4sj10o4e.gx.internal.cloudapp.net>
Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
Co-authored-by: JiCheng <wejoncy@163.com>
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
Cherry-pick PRs: 
#18026 
#17912 
#17901 “2 lines added whitespace errors when cherry-picking"
#17293 
#17364 
#17505 
#17885

This PR contains all the cherry-picks for the patch release except:
1. The PRs marked with sdxl_llama
2. #17772 which has a merge conflict.

---------

Co-authored-by: Chi Lo <Chi.Lo@microsoft.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
Co-authored-by: Kaz Nishimura <kazssym@linuxfront.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
2nd round of cherry pick LLaMA related changes to 1.16.2 release.
---------

Co-authored-by: aciddelgado <139922440+aciddelgado@users.noreply.github.com>
Co-authored-by: Frank Dong <123416088+frank-dong-ms@users.noreply.github.com>
### Description
Cherry picking Resize Grad PR #17772 



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
### Description
Pipeline changes for the 1.16.2 patch release. 
Cherry-pick
#17970 
#18069
### Description
Update eigen's URL because the old one doesn't point to a release tag.
See #18286 for the background. 
Before the incident happened, the eigen git commit id we were using was
: e7248b26a1ed53fa030c5c459f7ea095dfd276ac. This PR change eigen to that
version. The version is newer that Eigen's 3.4.0 tag.
Cherry-pick LLaMA GQA attention mask and script changes to 1.16.2 release branch.

---------
Co-authored-by: aciddelgado <139922440+aciddelgado@users.noreply.github.com>
Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
xdrBogdan22 and others added 28 commits November 9, 2023 16:28
- Enables other repos to call this workflow with parameterized options such as onnxruntime_branch
- To be accompanied by a corresponding sdk-cli change.
* Add QuadricCustomOp

* Update README_EPU.md with correct instructions
- Use python3.9
- Set --apple_deploy_target to 12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.