-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] Fix kernel bug #1959
[FIX] Fix kernel bug #1959
Conversation
I have completed all the kernel tests on the A800 GPU(x2), and all kernels can pass correctly.Can you review this PR? @WoosukKwon |
@WoosukKwon Are there any issues with this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jeejeelee thanks for submitting the PR. We've not noticed this bug since the device id is always 0 in vLLM. However, I agree that this change would make the kernel more portable.
csrc/activation_kernels.cu
Outdated
#include <torch/extension.h> | ||
#include <ATen/cuda/CUDAContext.h> | ||
|
||
#include <torch/extension.h> | ||
#include <c10/cuda/CUDAGuard.h> | ||
#include "dispatch_utils.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit:
#include <torch/extension.h> | |
#include <ATen/cuda/CUDAContext.h> | |
#include <torch/extension.h> | |
#include <c10/cuda/CUDAGuard.h> | |
#include "dispatch_utils.h" | |
#include <torch/extension.h> | |
#include <ATen/cuda/CUDAContext.h> | |
#include <c10/cuda/CUDAGuard.h> | |
#include "dispatch_utils.h" |
@WoosukKwon Thank you for your review. I have completed the following modifications:
Please review again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeejeelee LGTM! Thanks for the PR and apologies for the delayed review. We got many PRs last month and didn't have enough bandwidth due to the holidays. 😅
While assessing the effectiveness of the RMSNorm operator, I observed that executing this operator on non-zero GPU resulted in a 'RuntimeError: CUDA error: an illegal memory access was encountered.'
Upon further investigation through debugging, I found that cause as the absence of device guards, most cuda kernels have the same issues .
I have addressed the issue by incorporating device guards for all kernels. Additionally, I have augmented the kernel tests by including device id,such as in the test_activationprovided