Add calibrator option to specify dtypes #19

mgleonard425 · 2024-01-26T03:28:48Z

Description

Allows users of Calibrators to specify which data types they want to be quantized. Today, it's hardcoded to FLOAT in the ORT code.

Motivation and Context

Enables sdk-cli to add TensorProto.INT64 as an extra dtype to capture into tranges. This, in turn, helps CGC properly legalize int64 edges to int32 if ranges fully fit within int32 in calibration, which is much better for our architecture

Cherry-pick 1st round for rel-1.16.0 from https://github.com/microsoft/onnxruntime/issues?q=label%3Arelease%3A1.16+label%3Atriage%3Aapproved+is%3Aclosed except #17201 because it caused UT failure and is not fixed yet. PR list: #16417 #16936 #17000 #17236 #17238 #17240 #17252 #17255 #17258 #17265 #17267 #17277

Cherry-pick 2nd round for 1.16.0 release. PR List: #17201 #17270 #17311 #17315 #17320 #17326 #17355 #17227 #17380 #17386

### Description  Use name of temporary provisioning profile. ### Motivation and Context  The old provisioning profile no longer works. Switched to a temporary one that we can use before a new one is available. The temporary one has a different name. Alternative to #17454.

Disable QNN QDQ test for release branch ### Description Disable QNN QDQ test for release branch to get rid of model test failure caused by new model update in build image.

…(#17461) ### Description Remove 52 from CMAKE_CUDA_ARCHITECTURES to reduce Nuget package size. ### Motivation and Context PR #17227 increased binary size by 20%. Right the package size is about 260MB. However, nuget has a hard limit of 250MB. Without this change we cannot publish the package.

Cherry-pick #17507 for rel-1.16.0. Note: The PR 17507 contains the part of engine decryption refactor that we don't want to include it in ORT 1.16 release. This cherry pick PR excludes this part.

### Description 1. Delete Prefast tasks (#17522) 2. Disable yum update (#17551) 3. Avoid calling patchelf (#17365 and #17562) we that we can validate the above fix The main problem I'm trying to solve is: our GPU package depends on both CUDA 11.x and CUDA 12.x . However, it's not easy to see the information because ldd doesn't work with the shared libraries we generate(see issue #9754) . So the patchelf change are useful for me to validate the "Disabling yum update" was successful. As you can see we call "yum update" from multiple places. Without some kind of validation it's hard to say if I have covered all of them. The Prefast change is needed because I'm going to update the VM images in the next a few weeks. In case of we need to publish a patch release after that. ### Motivation and Context Without this fix we will mix using CUDA 11.x and CUDA 12.x. And it will crash every time when we use TensorRT.

Cherry-pick the following PRs to the release branch： Fix: Fail to skip disabledmodel in winml (#17728) Move dotnet build and test into docker in Linux CPU CI (#17417) Run Nuget_Test_Linux_GPU in container (#17452) Run Final_Jar_Testing_Linux_GPU in docker (#17533) TreeEnsemble speed up (#17449) Remove onnxruntime extensions from list of gitmodules (#17615) Include onnxruntime_float16.h in the package. (#17637) Fix static quantization for QDQ and Percentile distribution (#17649) [TensorRT EP] Back out the PerThreadContext (#17690) Update nodejs to 18.x (#17657) Update linux-wasm-ci.yml: remove the ln command (#17735)

Remove the condition to allow an empty provide list. Co-authored-by: Randy Shuai <rashuai@microsoft.com>

### Description fix session option access in Node.js binding ### Motivation and Context This is a bug that affect transformer.js using ONNX Runtime Node.js binding. Issue: #17377 This bug is already fixed in main branch, but it is not picked in 1.16 release.

#17651

### Description Python package pipeline fails due to "tokenizers" compilation. Since "tokenizers" is a dep of "transformers", we update its version and hope a new solution had been there. ``` error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell` --> tokenizers-lib/src/models/bpe/trainer.rs:517:47 ``` ### Motivation and Context Cherry-pick from #17823

1. Increase version number for preparing the 1.16.2 release (#18070) 2. cherry-pick 18034

Those masks are used for MHA in LLaMA.

@fdwr

@fdwr This is the part 2 of the pybind work that was started earlier. This adds the following features to the python IO binding implementation: - Use a bucketized allocator in order to reduce the number of resource allocations - Implement the following functions: `ortvalue_from_numpy`, `update_inplace`, `ortvalue_from_shape_and_type` and `numpy` - Modify the `onnxruntime_test_python_iobinding` tests to also run on DML Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com>

…4)" (#18150) (#18170) This reverts commit 99b8dca. ### Motivation and Context Restore the dml stage in windows GPU pipeline. Agent issue is solved by adding Feature.DisableGpuDriver in pool properties. --------- Co-authored-by: Yi Zhang <zhanyi@microsoft.com>

@jchen351

It is related to #18155 . The issue has been fixed in the main branch by @jchen351

Cherry-pick changes related to LLaMA and StableDiffusion XL to 1.16.2 release branch. ### Motivation and Context --------- Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com> Co-authored-by: Patrice Vignola <vignola.patrice@gmail.com> Co-authored-by: petermcaughan <peter.mcaughan@gmail.com> Co-authored-by: Peter McAughan <petermca@microsoft.com> Co-authored-by: Jambay Kinley <jambaykinley@microsoft.com> Co-authored-by: PeixuanZuo <94887879+PeixuanZuo@users.noreply.github.com> Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: aciddelgado <139922440+aciddelgado@users.noreply.github.com> Co-authored-by: tlwu@microsoft.com <tlwu@a100.crj0ad2y1kku1j4yxl4sj10o4e.gx.internal.cloudapp.net> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> Co-authored-by: JiCheng <wejoncy@163.com> Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>

Cherry-pick PRs: #18026 #17912 #17901 “2 lines added whitespace errors when cherry-picking" #17293 #17364 #17505 #17885 This PR contains all the cherry-picks for the patch release except: 1. The PRs marked with sdxl_llama 2. #17772 which has a merge conflict. --------- Co-authored-by: Chi Lo <Chi.Lo@microsoft.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com> Co-authored-by: Scott McKay <Scott.McKay@microsoft.com> Co-authored-by: Baiju Meswani <bmeswani@microsoft.com> Co-authored-by: Kaz Nishimura <kazssym@linuxfront.com> Co-authored-by: Scott McKay <skottmckay@gmail.com>

2nd round of cherry pick LLaMA related changes to 1.16.2 release. --------- Co-authored-by: aciddelgado <139922440+aciddelgado@users.noreply.github.com> Co-authored-by: Frank Dong <123416088+frank-dong-ms@users.noreply.github.com>

### Description Cherry picking Resize Grad PR #17772 ### Motivation and Context  Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>

### Description Pipeline changes for the 1.16.2 patch release. Cherry-pick #17970 #18069

### Description Update eigen's URL because the old one doesn't point to a release tag.

See #18286 for the background. Before the incident happened, the eigen git commit id we were using was : e7248b26a1ed53fa030c5c459f7ea095dfd276ac. This PR change eigen to that version. The version is newer that Eigen's 3.4.0 tag.

Cherry-pick LLaMA GQA attention mask and script changes to 1.16.2 release branch. --------- Co-authored-by: aciddelgado <139922440+aciddelgado@users.noreply.github.com> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>

…nnxruntime

- Enables other repos to call this workflow with parameterized options such as onnxruntime_branch - To be accompanied by a corresponding sdk-cli change.

* Add QuadricCustomOp * Update README_EPU.md with correct instructions

- Use python3.9 - Set --apple_deploy_target to 12

Lafi7e and others added 30 commits August 28, 2023 12:34

Cherry-pick 2nd Round (#17386)

4296043

Cherry-pick 2nd round for 1.16.0 release. PR List: #17201 #17270 #17311 #17315 #17320 #17326 #17355 #17227 #17380 #17386

[rel-1.16.0] Disable QNN QDQ test for release branch (#17463)

196df08

Disable QNN QDQ test for release branch ### Description Disable QNN QDQ test for release branch to get rid of model test failure caused by new model update in build image.

[rel-1.16.0] Cherry-pick 17507 (#17520)

0772d54

Cherry-pick #17507 for rel-1.16.0. Note: The PR 17507 contains the part of engine decryption refactor that we don't want to include it in ORT 1.16 release. This cherry pick PR excludes this part.

[rel-1.16.0] Cherry-pick 16940 and 17523 (#17506)

06ea28b

Cancel EP check in python for 1.16.1 (#17768)

6df4211

Remove the condition to allow an empty provide list. Co-authored-by: Randy Shuai <rashuai@microsoft.com>

Fix onnx quantizer activation and weight type attribute

c3fd281

#17651

Increase version number for preparing the 1.16.2 release (#18070)

c829550

1. Increase version number for preparing the 1.16.2 release (#18070) 2. cherry-pick 18034

[DML EP] Enable more MHA masks (#18120)

53cb942

Those masks are used for MHA in LLaMA.

[DML EP] Add subgraph fusion support (#18125)

749bcc7

Add support for GCC 13 (#18178)

0240274

It is related to #18155 . The issue has been fixed in the main branch by @jchen351

[DML EP] Add dynamic graph compilation (#18199)

bc533a6

Cherry pick LLaMA to rel-1.16.2 (round 2) (#18245)

70b8cda

2nd round of cherry pick LLaMA related changes to 1.16.2 release. --------- Co-authored-by: aciddelgado <139922440+aciddelgado@users.noreply.github.com> Co-authored-by: Frank Dong <123416088+frank-dong-ms@users.noreply.github.com>

Cherry-pick two pipeline changes for the 1.16.2 patch release (#18249)

95c20d0

### Description Pipeline changes for the 1.16.2 patch release. Cherry-pick #17970 #18069

Update eigen's URL (#18301)

ad7cecb

### Description Update eigen's URL because the old one doesn't point to a release tag.

Update eigen version (#18308)

0ccca88

See #18286 for the background. Before the incident happened, the eigen git commit id we were using was : e7248b26a1ed53fa030c5c459f7ea095dfd276ac. This PR change eigen to that version. The version is newer that Eigen's 3.4.0 tag.

Cherry pick LLaMA or SDXL to 1.16.2 release (round 3) (#18323)

8f06330

Always quantize global average pool

96451b1

xdrBogdan22 and others added 28 commits November 9, 2023 16:28

updated code for convtranspose2d to work for the quadric version of o…

8f6dd3d

…nnxruntime

fixed registry

92f3be0

Adds shape inference for conv2d_transpose

6271d0a

Add QLinearConvTranspose CPU implementation

a17bd73

Added int32_t templated version of Col2im

b0b734c

ci: Create wheel and release upon each push to main (#1)

fb57d97

Shape inference for QLinearAdd and QLinearConcat

83d722c

Shape inference for QLinearMul

1f2b1eb

Shape inference for QLinearLeakyReLU

5533b52

Adds shape inference for remaining QLinear operators

5ad0ee1

Add unittest for QLinear shape inference

7b0fdc8

Updated shape inference to default to ONNX implementation

846eb6a

Applied python linter

b6f98f7

Refactored QLinear shape inference test

66be256

Addressed PR comments

fab74f8

Add type inference support for custom operators

9c27dce

Removed formatting changes

fd2196f

Add shape inference for QLinearConvTranspose

0a347e6

Add unnittest for QLinearConvTranspose shape inference

9ae4f2e

ci: Enable workflow_call

e8f4122

- Enables other repos to call this workflow with parameterized options such as onnxruntime_branch - To be accompanied by a corresponding sdk-cli change.

ci: Add mac python3.10 release

314b40b

ci: Create release on github.ref == refs/heads/main

c70f411

wheel.yaml: No warning as error during compile

365a6db

QuadricCustomOp handling (#12)

3989509

* Add QuadricCustomOp * Update README_EPU.md with correct instructions

ci: Use self hosted arm64 macos runner (#16)

4f4d738

- Use python3.9 - Set --apple_deploy_target to 12

QuadricCustomOp: Handle multiple outputs when shape inferencing (#17)

77c6035

quadric_custom_op: Handle duplicated input names (#18)

7845828

Add calibrator option to add extra dtypes

190e428

mgleonard425 force-pushed the mike-int64-tranges branch from 7d644cf to 190e428 Compare January 26, 2024 03:30

ndrego force-pushed the main branch from 94f031e to e393dd1 Compare December 4, 2024 01:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add calibrator option to specify dtypes #19

Add calibrator option to specify dtypes #19

Uh oh!

mgleonard425 commented Jan 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

16 participants

Add calibrator option to specify dtypes #19

Are you sure you want to change the base?

Add calibrator option to specify dtypes #19

Uh oh!

Conversation

mgleonard425 commented Jan 26, 2024

Description

Motivation and Context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

16 participants