Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Bug in quantization/awq /gemm_kernels.cu gemm_forward_4bit_cuda_m16nXk32 More result have been write #7400

Open
mengsoso opened this issue Aug 11, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@mengsoso
Copy link

Your current environment

The output of `python collect_env.py`
Your output of `python collect_env.py` here

🐛 Describe the bug

截屏2024-08-11 下午10 05 10

When N=64, we don't have 4*8=32 c_warp result; In this case, we only have 2(N/32) * 8=16 c_warp results.

@mengsoso mengsoso added the bug Something isn't working label Aug 11, 2024
@youkaichao
Copy link
Member

cc @mgoin

@mgoin
Copy link
Collaborator

mgoin commented Aug 11, 2024

It looks like this was introduced during the refactor here #2723

Essentially no changes to the kernel have been made since, only formatting or removing warnings. @mengsoso do you have a model/scenario where this bug occurs?

@mengsoso
Copy link
Author

mengsoso commented Aug 12, 2024

@mgoin thank you for response
When we evaluate the result only for the gemm_forward_4bit_cuda_m16nXk32 kernel, the result is wrong when N=64.

reference #2723
When N=64 / 128:

① Parameter assignment:
截屏2024-08-12 上午10 20 12
② calculate:
截屏2024-08-12 上午10 20 48
③ Result write back:
截屏2024-08-12 上午10 20 58

When N=64, you can always only operate 2 (N/32) *8 c_warp.

@casper-hansen
Copy link
Contributor

Good catch. Here is a model with group size 64, for reference.

https://huggingface.co/TechxGenus/DeepSeek-Coder-V2-Lite-Instruct-AWQ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants