-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDA] Set GPU device ID in threads #6028
Conversation
@@ -224,6 +223,7 @@ void CUDAColumnData::ResizeWhenCopySubrow(const data_size_t num_used_indices) { | |||
#pragma omp parallel for schedule(static) num_threads(num_threads_) | |||
for (int column_index = 0; column_index < num_columns_; ++column_index) { | |||
OMP_LOOP_EX_BEGIN(); | |||
SetCUDADevice(gpu_device_id_, __FILE__, __LINE__); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to set this inside loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move outside. d4695ff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thorough explanation! It's really helpful for my understanding 😊
I just merged latest @shiyu1994 , could you go into the repo settings and check this "Always suggest updating pull request branches" box? ![]() That would add a button that you can click on PRs to merge |
Done. |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
This is to fix the issue proposed in #6018. As reported in the issue, currently illegal memory access may arise when using a
gpu_device_id > 0
forcuda
tree learner. This is because some CUDA memory is allocated in threads. Though the device ID is set in the main thread, it is not set in the newly created threads. The allocated CUDA memory in those threads may reside in GPU 0 by default, which is different with thegpu_device_id
. Accessing to such memory ongpu_device_id
may cause illegal memory access.Here's one example,
LightGBM/src/io/cuda/cuda_column_data.cpp
Lines 120 to 137 in 20975ba