-
Notifications
You must be signed in to change notification settings - Fork 1
Add calibrator option to specify dtypes #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
mgleonard425
wants to merge
59
commits into
main
Choose a base branch
from
mike-int64-tranges
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Cherry-pick 1st round for rel-1.16.0 from https://github.com/microsoft/onnxruntime/issues?q=label%3Arelease%3A1.16+label%3Atriage%3Aapproved+is%3Aclosed except #17201 because it caused UT failure and is not fixed yet. PR list: #16417 #16936 #17000 #17236 #17238 #17240 #17252 #17255 #17258 #17265 #17267 #17277
Cherry-pick 2nd round for 1.16.0 release. PR List: #17201 #17270 #17311 #17315 #17320 #17326 #17355 #17227 #17380 #17386
### Description <!-- Describe your changes. --> Use name of temporary provisioning profile. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> The old provisioning profile no longer works. Switched to a temporary one that we can use before a new one is available. The temporary one has a different name. Alternative to #17454.
Disable QNN QDQ test for release branch ### Description Disable QNN QDQ test for release branch to get rid of model test failure caused by new model update in build image.
…(#17461) ### Description Remove 52 from CMAKE_CUDA_ARCHITECTURES to reduce Nuget package size. ### Motivation and Context PR #17227 increased binary size by 20%. Right the package size is about 260MB. However, nuget has a hard limit of 250MB. Without this change we cannot publish the package.
Cherry-pick #17507 for rel-1.16.0. Note: The PR 17507 contains the part of engine decryption refactor that we don't want to include it in ORT 1.16 release. This cherry pick PR excludes this part.
### Description 1. Delete Prefast tasks (#17522) 2. Disable yum update (#17551) 3. Avoid calling patchelf (#17365 and #17562) we that we can validate the above fix The main problem I'm trying to solve is: our GPU package depends on both CUDA 11.x and CUDA 12.x . However, it's not easy to see the information because ldd doesn't work with the shared libraries we generate(see issue #9754) . So the patchelf change are useful for me to validate the "Disabling yum update" was successful. As you can see we call "yum update" from multiple places. Without some kind of validation it's hard to say if I have covered all of them. The Prefast change is needed because I'm going to update the VM images in the next a few weeks. In case of we need to publish a patch release after that. ### Motivation and Context Without this fix we will mix using CUDA 11.x and CUDA 12.x. And it will crash every time when we use TensorRT.
Cherry-pick the following PRs to the release branch: Fix: Fail to skip disabledmodel in winml (#17728) Move dotnet build and test into docker in Linux CPU CI (#17417) Run Nuget_Test_Linux_GPU in container (#17452) Run Final_Jar_Testing_Linux_GPU in docker (#17533) TreeEnsemble speed up (#17449) Remove onnxruntime extensions from list of gitmodules (#17615) Include onnxruntime_float16.h in the package. (#17637) Fix static quantization for QDQ and Percentile distribution (#17649) [TensorRT EP] Back out the PerThreadContext (#17690) Update nodejs to 18.x (#17657) Update linux-wasm-ci.yml: remove the ln command (#17735)
Remove the condition to allow an empty provide list. Co-authored-by: Randy Shuai <rashuai@microsoft.com>
### Description fix session option access in Node.js binding ### Motivation and Context This is a bug that affect transformer.js using ONNX Runtime Node.js binding. Issue: #17377 This bug is already fixed in main branch, but it is not picked in 1.16 release.
### Description Python package pipeline fails due to "tokenizers" compilation. Since "tokenizers" is a dep of "transformers", we update its version and hope a new solution had been there. ``` error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell` --> tokenizers-lib/src/models/bpe/trainer.rs:517:47 ``` ### Motivation and Context Cherry-pick from #17823
1. Increase version number for preparing the 1.16.2 release (#18070) 2. cherry-pick 18034
Those masks are used for MHA in LLaMA.
@fdwr This is the part 2 of the pybind work that was started earlier. This adds the following features to the python IO binding implementation: - Use a bucketized allocator in order to reduce the number of resource allocations - Implement the following functions: `ortvalue_from_numpy`, `update_inplace`, `ortvalue_from_shape_and_type` and `numpy` - Modify the `onnxruntime_test_python_iobinding` tests to also run on DML Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com>
…4)" (#18150) (#18170) This reverts commit 99b8dca. ### Motivation and Context Restore the dml stage in windows GPU pipeline. Agent issue is solved by adding Feature.DisableGpuDriver in pool properties. --------- Co-authored-by: Yi Zhang <zhanyi@microsoft.com>
It is related to #18155 . The issue has been fixed in the main branch by @jchen351
Cherry-pick changes related to LLaMA and StableDiffusion XL to 1.16.2 release branch. ### Motivation and Context --------- Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com> Co-authored-by: Patrice Vignola <vignola.patrice@gmail.com> Co-authored-by: petermcaughan <peter.mcaughan@gmail.com> Co-authored-by: Peter McAughan <petermca@microsoft.com> Co-authored-by: Jambay Kinley <jambaykinley@microsoft.com> Co-authored-by: PeixuanZuo <94887879+PeixuanZuo@users.noreply.github.com> Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: aciddelgado <139922440+aciddelgado@users.noreply.github.com> Co-authored-by: tlwu@microsoft.com <tlwu@a100.crj0ad2y1kku1j4yxl4sj10o4e.gx.internal.cloudapp.net> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> Co-authored-by: JiCheng <wejoncy@163.com> Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
Cherry-pick PRs: #18026 #17912 #17901 “2 lines added whitespace errors when cherry-picking" #17293 #17364 #17505 #17885 This PR contains all the cherry-picks for the patch release except: 1. The PRs marked with sdxl_llama 2. #17772 which has a merge conflict. --------- Co-authored-by: Chi Lo <Chi.Lo@microsoft.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com> Co-authored-by: Scott McKay <Scott.McKay@microsoft.com> Co-authored-by: Baiju Meswani <bmeswani@microsoft.com> Co-authored-by: Kaz Nishimura <kazssym@linuxfront.com> Co-authored-by: Scott McKay <skottmckay@gmail.com>
2nd round of cherry pick LLaMA related changes to 1.16.2 release. --------- Co-authored-by: aciddelgado <139922440+aciddelgado@users.noreply.github.com> Co-authored-by: Frank Dong <123416088+frank-dong-ms@users.noreply.github.com>
### Description Cherry picking Resize Grad PR #17772 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
### Description Pipeline changes for the 1.16.2 patch release. Cherry-pick #17970 #18069
### Description Update eigen's URL because the old one doesn't point to a release tag.
See #18286 for the background. Before the incident happened, the eigen git commit id we were using was : e7248b26a1ed53fa030c5c459f7ea095dfd276ac. This PR change eigen to that version. The version is newer that Eigen's 3.4.0 tag.
Cherry-pick LLaMA GQA attention mask and script changes to 1.16.2 release branch. --------- Co-authored-by: aciddelgado <139922440+aciddelgado@users.noreply.github.com> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
- Enables other repos to call this workflow with parameterized options such as onnxruntime_branch - To be accompanied by a corresponding sdk-cli change.
* Add QuadricCustomOp * Update README_EPU.md with correct instructions
- Use python3.9 - Set --apple_deploy_target to 12
7d644cf
to
190e428
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Allows users of
Calibrators
to specify which data types they want to be quantized. Today, it's hardcoded toFLOAT
in the ORT code.Motivation and Context
Enables
sdk-cli
to addTensorProto.INT64
as an extra dtype to capture intotranges
. This, in turn, helps CGC properly legalizeint64
edges toint32
if ranges fully fit withinint32
in calibration, which is much better for our architecture