Fix a bug in the implementation of dequantization for inference #3433

sakogan · 2023-05-03T14:59:22Z

The implementation of dequantization for inference appears to be incorrect.

It sets the number of blocks to ceil(output_size / hid_cnt*groups) and the size of each block (aka merge_hidden in dequantize_kernel) to hid_cnt*hidden_dim. However, for some combinations of arguments to launch_dequantize (e.g., with output_size=768, hidden_dim=768 and groups=144, which corresponds to a BERT-base model) this leads to the creation of overlapping blocks in dequantize_kernel and hence erroneous calculation of dequantized input.

The proposed solution is to get rid of hid_cnt and simply set the number of blocks to output_size / groups, while the size of each block becomes hidden_dim.

The PR includes a simple unittest that demonstrates the problem (in the baseline implementation) and validates the proposed solution on a number of use cases.

Get rid of `hid_cnt` and simply set #blocks to output size / #groups

sakogan · 2023-06-06T15:29:48Z

@loadams any update/ETA on this PR?

FYI, I just rebased from master.
Please, let me know if there is anything else I can do to facilitate reviewing/merging this PR.

loadams · 2023-06-06T17:37:39Z

@loadams any update/ETA on this PR?

FYI, I just rebased from master. Please, let me know if there is anything else I can do to facilitate reviewing/merging this PR.

Getting the right folks to review, sorry for the delay on getting this reviewed/merged.

RezaYazdaniAminabadi · 2023-08-16T03:10:59Z

Hi @sakogan ,

Thanks for the PR. Sorry for my delay to get back to this.
The issue that you raised here is valid and definitely needs to be addressed. However, this kernel is aimed for the token-wise dequantization, in which case the output dimension needs to be divisible by the #groups. In the example that you demonstrated the problem, that is not the case as 768 is not divisible by 144 (also, I am not seeing this case in your unit test), and thus the kernel implementation per se will not give you the right result.

So, may I suggest that we revert these changes for this kernel and create another one that takes into account both hidden and output dimensions for getting the right group_index for reading the scale? For instance, if I understand correctly each 4K elements in the 768*768 matrix will have a different scale (considering 144 scales)

Best,
Reza

sakogan · 2023-08-29T21:46:34Z

@RezaYazdaniAminabadi, thanks for the review!

My code indeed assumes that the output dimension is divisible by the #groups. It even includes this assert that would trigger if this is not the case.

I think I made a typo in my description -- I meant to say groups=48 instead of groups=144 (the issue also exists in the attention module where QKV weights are concatenated and use 3 times the number of groups, i.e., 48*3 = 144, but then output size should be 3 times larger as well). Anyway, sorry about the confusion! The case of groups=48 is present in my unit test.

RezaYazdaniAminabadi · 2023-09-14T20:00:19Z

Thanks, this now makes sense 👍

* origin/master: (48 commits) Fix autotune to support Triton 2.1 (deepspeedai#4340) Fix skipped inference tests (deepspeedai#4336) Suppress noise (deepspeedai#4310) Fix a bug in the implementation of dequantization for inference (deepspeedai#3433) DS-Chat BLOOM: Fix Attention mask (deepspeedai#4338) clear redundant timers (deepspeedai#4308) Add release version checking (deepspeedai#4328) Fix Zero3 contiguous grads, reduce scatter false accuracy issue (deepspeedai#4321) Clean up modeling code (deepspeedai#4320) Handle empty parameter groups (deepspeedai#4277) Update README.md (deepspeedai#4316) README update (deepspeedai#4303) Update release and bump patch versioning flow (deepspeedai#4286) added a bert-model check for triton (deepspeedai#4266) ZeRO-Inference v2 release bump to 0.10.4 Update index.md (deepspeedai#4297) fix user args parsing of string with spaces on runner (deepspeedai#4265) ZeRO-Inference refresh (deepspeedai#4197) AMD Kernel Compatibility Fixes (deepspeedai#3180) ...

sakogan added 2 commits May 2, 2023 14:38

bugfix in launch_dequantize()

4a8974c

Get rid of `hid_cnt` and simply set #blocks to output size / #groups

add a unit test for dequantization

512defe

sakogan changed the title ~~Dequantize bugfix~~ Fix a bug in the implementation of dequantization for inference May 3, 2023

sakogan marked this pull request as ready for review May 8, 2023 20:58

sakogan requested review from jeffra, mrwyattii, tjruwase, RezaYazdaniAminabadi, awan-10, cmikeh2 and arashb as code owners May 8, 2023 20:58

loadams and others added 3 commits May 11, 2023 09:35

Merge branch 'master' into dequantize-bugfix

baae887

Merge branch 'master' into dequantize-bugfix

884ba38

Merge branch 'master' into dequantize-bugfix

40dba60

Merge branch 'master' into dequantize-bugfix

8b60f93

Merge branch 'master' into dequantize-bugfix

501f81b

RezaYazdaniAminabadi approved these changes Sep 14, 2023

View reviewed changes

Merge branch 'master' into dequantize-bugfix

b87ec6b

RezaYazdaniAminabadi enabled auto-merge September 14, 2023 20:01

RezaYazdaniAminabadi added this pull request to the merge queue Sep 14, 2023

Merged via the queue into deepspeedai:master with commit 9bf7778 Sep 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a bug in the implementation of dequantization for inference #3433

Fix a bug in the implementation of dequantization for inference #3433

sakogan commented May 3, 2023 •

edited

Loading

sakogan commented Jun 6, 2023

loadams commented Jun 6, 2023

RezaYazdaniAminabadi commented Aug 16, 2023

sakogan commented Aug 29, 2023

RezaYazdaniAminabadi commented Sep 14, 2023

Fix a bug in the implementation of dequantization for inference #3433

Fix a bug in the implementation of dequantization for inference #3433

Conversation

sakogan commented May 3, 2023 • edited Loading

sakogan commented Jun 6, 2023

loadams commented Jun 6, 2023

RezaYazdaniAminabadi commented Aug 16, 2023

sakogan commented Aug 29, 2023

RezaYazdaniAminabadi commented Sep 14, 2023

sakogan commented May 3, 2023 •

edited

Loading