[CUDA] Set GPU device ID in threads #6028

shiyu1994 · 2023-08-10T05:15:43Z

This is to fix the issue proposed in #6018. As reported in the issue, currently illegal memory access may arise when using a gpu_device_id > 0 for cuda tree learner. This is because some CUDA memory is allocated in threads. Though the device ID is set in the main thread, it is not set in the newly created threads. The allocated CUDA memory in those threads may reside in GPU 0 by default, which is different with the gpu_device_id. Accessing to such memory on gpu_device_id may cause illegal memory access.

Here's one example,

LightGBM/src/io/cuda/cuda_column_data.cpp

Lines 120 to 137 in 20975ba

    
           #pragma omp parallel for schedule(static) num_threads(num_threads_) 
        
           for (int column_index = 0; column_index < num_columns_; ++column_index) { 
        
             OMP_LOOP_EX_BEGIN(); 
        
             const int8_t bit_type = column_bit_type[column_index]; 
        
             if (column_data[column_index] != nullptr) { 
        
               // is dense column 
        
               if (bit_type == 4) { 
        
                 column_bit_type_[column_index] = 8; 
        
                 InitOneColumnData<false, true, uint8_t>(column_data[column_index], nullptr, &data_by_column_[column_index]); 
        
               } else if (bit_type == 8) { 
        
                 InitOneColumnData<false, false, uint8_t>(column_data[column_index], nullptr, &data_by_column_[column_index]); 
        
               } else if (bit_type == 16) { 
        
                 InitOneColumnData<false, false, uint16_t>(column_data[column_index], nullptr, &data_by_column_[column_index]); 
        
               } else if (bit_type == 32) { 
        
                 InitOneColumnData<false, false, uint32_t>(column_data[column_index], nullptr, &data_by_column_[column_index]); 
        
               } else { 
        
                 Log::Fatal("Unknow column bit type %d", bit_type); 
        
               }

guolinke · 2023-08-10T05:31:00Z

src/io/cuda/cuda_column_data.cpp

@@ -224,6 +223,7 @@ void CUDAColumnData::ResizeWhenCopySubrow(const data_size_t num_used_indices) {
  #pragma omp parallel for schedule(static) num_threads(num_threads_)
  for (int column_index = 0; column_index < num_columns_; ++column_index) {
    OMP_LOOP_EX_BEGIN();
+    SetCUDADevice(gpu_device_id_, __FILE__, __LINE__);


do we need to set this inside loop?

Move outside. d4695ff

jameslamb

Thanks for the thorough explanation! It's really helpful for my understanding 😊

jameslamb · 2023-08-11T19:59:24Z

I just merged latest master into this, to get the fixes from #6032.

@shiyu1994 , could you go into the repo settings and check this "Always suggest updating pull request branches" box?

That would add a button that you can click on PRs to merge master in them, so there wouldn't be a need to pull and push locally. I've found that saves a bit of work in other projects I work on.

shiyu1994 · 2023-08-11T22:16:26Z

could you go into the repo settings and check this "Always suggest updating pull request branches" box?

Done.

github-actions · 2023-11-15T00:21:30Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

set gpu device id in open mp threads

7b9e92c

shiyu1994 requested review from StrikerRUS and btrotta August 10, 2023 05:15

shiyu1994 requested review from guolinke, jameslamb and jmoralez as code owners August 10, 2023 05:15

guolinke reviewed Aug 10, 2023

View reviewed changes

move SetCUDADevice outside for loop

d4695ff

shiyu1994 requested a review from guolinke August 10, 2023 11:32

shiyu1994 added fix awaiting review labels Aug 10, 2023

guolinke approved these changes Aug 10, 2023

View reviewed changes

jameslamb removed the awaiting review label Aug 10, 2023

shiyu1994 mentioned this pull request Aug 11, 2023

[RFC] [ci] Dask compatibility with Python 3.8 and Pandas 2.0 #6030

Closed

jameslamb changed the title ~~[CUDA][fix] Set GPU device ID in threads~~ [CUDA] Set GPU device ID in threads Aug 11, 2023

jameslamb approved these changes Aug 11, 2023

View reviewed changes

Merge branch 'master' into cuda/fix-gpu-device-id-bug

f1e29a4

shiyu1994 merged commit 5c9e61d into master Aug 13, 2023

shiyu1994 deleted the cuda/fix-gpu-device-id-bug branch August 13, 2023 15:14

shiyu1994 mentioned this pull request Aug 13, 2023

question: Nvidia A100 cuda 11.8 which implementation to use? #6018

Closed

github-actions bot locked as resolved and limited conversation to collaborators Nov 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] Set GPU device ID in threads #6028

[CUDA] Set GPU device ID in threads #6028

shiyu1994 commented Aug 10, 2023

guolinke Aug 10, 2023

shiyu1994 Aug 10, 2023

jameslamb left a comment

jameslamb commented Aug 11, 2023

shiyu1994 commented Aug 11, 2023

github-actions bot commented Nov 15, 2023

	#pragma omp parallel for schedule(static) num_threads(num_threads_)
	for (int column_index = 0; column_index < num_columns_; ++column_index) {
	OMP_LOOP_EX_BEGIN();
	const int8_t bit_type = column_bit_type[column_index];
	if (column_data[column_index] != nullptr) {
	// is dense column
	if (bit_type == 4) {
	column_bit_type_[column_index] = 8;
	InitOneColumnData<false, true, uint8_t>(column_data[column_index], nullptr, &data_by_column_[column_index]);
	} else if (bit_type == 8) {
	InitOneColumnData<false, false, uint8_t>(column_data[column_index], nullptr, &data_by_column_[column_index]);
	} else if (bit_type == 16) {
	InitOneColumnData<false, false, uint16_t>(column_data[column_index], nullptr, &data_by_column_[column_index]);
	} else if (bit_type == 32) {
	InitOneColumnData<false, false, uint32_t>(column_data[column_index], nullptr, &data_by_column_[column_index]);
	} else {
	Log::Fatal("Unknow column bit type %d", bit_type);
	}

[CUDA] Set GPU device ID in threads #6028

[CUDA] Set GPU device ID in threads #6028

Conversation

shiyu1994 commented Aug 10, 2023

guolinke Aug 10, 2023

Choose a reason for hiding this comment

shiyu1994 Aug 10, 2023

Choose a reason for hiding this comment

jameslamb left a comment

Choose a reason for hiding this comment

jameslamb commented Aug 11, 2023

shiyu1994 commented Aug 11, 2023

github-actions bot commented Nov 15, 2023