remove `torch.cuda.is_available()` check when compiling ops #3085

jinzhen-lin · 2023-03-23T04:57:05Z

torch.cuda.is_available() is not necessary here. And I would cause #2858 when compiling deepspeed >= 0.8.1 on a machine without gpu (e.g. docker image build).

tjruwase · 2023-03-24T11:47:07Z

@jinzhen-lin, thanks for your contribution. But can you please provide some more details on the issue fixed by this PR? In my experience, the commented code works fine on machines without gpu including this CI. Thanks!

jinzhen-lin · 2023-03-24T12:49:31Z

@tjruwase Exactly I mean compiling cuda ops on a machine without gpu. But the CI doesn't build ops.

In the mentioned issue, we encountered an error since the quantizer op (introduced in v0.8.1) need cuda half operators, but the compilation arguments -D__CUDA_NO_HALF_OPERATORS__, -D__CUDA_NO_HALF_CONVERSIONS__, -D__CUDA_NO_BFLOAT16_CONVERSIONS__, and -D__CUDA_NO_HALF2_OPERATORS__ are set. (You can search those arguments on the mentioned issue page).

So we need those nvcc arguments :

DeepSpeed/op_builder/builder.py

Lines 687 to 695 in 258d283

    
           args += [ 
        
               '-allow-unsupported-compiler' if sys.platform == "win32" else '', 
        
               '--use_fast_math', 
        
               '-std=c++17' 
        
               if sys.platform == "win32" and cuda_major > 10 else '-std=c++14', 
        
               '-U__CUDA_NO_HALF_OPERATORS__', 
        
               '-U__CUDA_NO_HALF_CONVERSIONS__', 
        
               '-U__CUDA_NO_HALF2_OPERATORS__' 
        
           ]

But those arguments are ignored since we do the cuda check here

DeepSpeed/op_builder/builder.py

Lines 622 to 631 in 258d283

    
           def builder(self): 
        
               self.build_for_cpu = not assert_no_cuda_mismatch(self.name) 
        
               if self.build_for_cpu: 
        
                   from torch.utils.cpp_extension import CppExtension as ExtensionBuilder 
        
               else: 
        
                   from torch.utils.cpp_extension import CUDAExtension as ExtensionBuilder 
        
               compile_args = {'cxx': self.strip_empty_entries(self.cxx_args())} if self.build_for_cpu else \ 
        
                              {'cxx': self.strip_empty_entries(self.cxx_args()), \ 
        
                                  'nvcc': self.strip_empty_entries(self.nvcc_args())}

The cuda check doesn't pass since we cannot get the true cuda version with installed_cuda_version. With this PR, we get the true cuda version and the issue should be fixed.

I think installed_cuda_version should always return the installed cuda toolkit version on the system, it should work even on a machine without gpu but with cudatoolkit.

tjruwase · 2023-03-24T18:44:54Z

@jinzhen-lin, thanks for your helpful explanation. It seems the problem is that we assume that build and target environments are the same. We recently started enabling DeepSpeed for CPU-only target environments, and we distinguish from GPU target environments by testing for GPU availability using torch.cuda.is_available(). It is now clear that our approach does not work for your scenario where you are building CUDA OPs in environment with CUDA libraries but no GPUs. The problem with this PR is that it will break builds for CPU-only environments. It seems a more robust solution is cross-compilation, and a key challenge would be enabling users to conveniently specify the target environments, implicitly or explicitly.

Please share your thoughts on this. Thanks!

@jeffra, @mrwyattii FYI

jinzhen-lin · 2023-03-27T04:31:06Z

@tjruwase Sorry for absence of cpu builds checking before PR.

I notice that the cpu-only target environments was introduced recently (after v0.8.0) and deepspeed is mainly for gpu now. So we should always assume user want a cuda build, and we should do a cpu build when:

we cannot get the cuda in the build environment or cuda version is incompatible with torch cuda version
user specify a environment variable (e.g. DS_BUILD_OPS_CPU)

jinzhen-lin · 2023-03-27T13:55:59Z

@microsoft-github-policy-service agree

tjruwase · 2023-04-17T20:33:58Z

@jinzhen-lin, thanks for updating the PR. This is an improvement but not quite cross-compilation. Nevertheless, this will suffice for now.

remove torch.cuda.is_available() check when compiling ops

bb52688

jinzhen-lin requested review from jeffra, RezaYazdaniAminabadi and cmikeh2 as code owners March 23, 2023 04:57

Merge branch 'master' into master

087abbb

Merge branch 'master' into master

f8e7a76

jinzhen-lin force-pushed the master branch from 602e19f to f8e7a76 Compare March 27, 2023 13:49

check cuda environment with assert_no_cuda_mismatch

4c05806

loadams mentioned this pull request Mar 29, 2023

[BUG] Can't compile DeepSpeed version 0.8.1+ with Cuda 11.7 #2914

Closed

Merge branch 'master' into master

d1eb9bd

loadams mentioned this pull request Mar 31, 2023

[BUG] Install on AMD ROCm system but fails to build on CUDA dependencies #3091

Closed

Merge branch 'master' into master

e24e58e

hadim mentioned this pull request Apr 13, 2023

deepspeed v0.9.0 conda-forge/deepspeed-feedstock#10

Merged

3 tasks

Merge branch 'master' into master

c9b0012

jeffra added a commit that referenced this pull request Apr 18, 2023

apply #3085

f45138d

jeffra approved these changes Apr 18, 2023

View reviewed changes

jeffra and others added 3 commits April 18, 2023 10:38

formatting

49ec9f1

Merge branch 'master' into master

5e4e095

Merge branch 'master' into master

7e25218

jeffra added the merge-queue PRs ready to merge label Apr 18, 2023

loadams added 2 commits April 18, 2023 13:14

Merge branch 'master' into master

b9672ed

Merge branch 'master' into master

6587281

loadams linked an issue Apr 18, 2023 that may be closed by this pull request

Compilation error for 0.8.1 with CUDA 11.2 #2858

Closed

loadams enabled auto-merge (squash) April 18, 2023 22:49

jeffra disabled auto-merge April 19, 2023 00:45

jeffra merged commit 036c5d6 into microsoft:master Apr 19, 2023

conglongli added deepspeed-chat Related to DeepSpeed-Chat and removed deepspeed-chat Related to DeepSpeed-Chat labels Apr 30, 2023

h-vetinari mentioned this pull request May 13, 2023

Rebuild for pytorch20 conda-forge/deepspeed-feedstock#9

Closed

loadams mentioned this pull request Nov 6, 2023

Fix issues with torch cpu builds #4639

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove `torch.cuda.is_available()` check when compiling ops #3085

remove `torch.cuda.is_available()` check when compiling ops #3085

jinzhen-lin commented Mar 23, 2023

tjruwase commented Mar 24, 2023

jinzhen-lin commented Mar 24, 2023 •

edited

Loading

tjruwase commented Mar 24, 2023 •

edited

Loading

jinzhen-lin commented Mar 27, 2023

jinzhen-lin commented Mar 27, 2023

tjruwase commented Apr 17, 2023

remove torch.cuda.is_available() check when compiling ops #3085

remove torch.cuda.is_available() check when compiling ops #3085

Conversation

jinzhen-lin commented Mar 23, 2023

tjruwase commented Mar 24, 2023

jinzhen-lin commented Mar 24, 2023 • edited Loading

tjruwase commented Mar 24, 2023 • edited Loading

jinzhen-lin commented Mar 27, 2023

jinzhen-lin commented Mar 27, 2023

tjruwase commented Apr 17, 2023

remove `torch.cuda.is_available()` check when compiling ops #3085

remove `torch.cuda.is_available()` check when compiling ops #3085

jinzhen-lin commented Mar 24, 2023 •

edited

Loading

tjruwase commented Mar 24, 2023 •

edited

Loading