Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm] temporary workaround till __double2half support enabled in HIP #3236

Merged
merged 6 commits into from
Apr 18, 2023

Conversation

bmedishe
Copy link
Contributor

This is a temporary fix for the following error , which is encountered when running stable_diffusion inference with deepspeed inference
till __double2half intrinsic support is enabled on rocm.

FAILED: gelu.cuda.o
/opt/rocm/bin/hipcc  -DWITH_HIP -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THH -isystem /opt/rocm/include -isystem /opt/rocm/miopen/include -isystem /opt/rocm/hip/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -fPIC -D__HIP_PLATFORM_HCC__=1 -DUSE_ROCM=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++14 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=5 -DROCM_VERSION_MINOR=4 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908 --amdgpu-target=gfx90a --amdgpu-target=gfx1030 -fno-gpu-rdc -c /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.hip -o gelu.cuda.o
Warning: The --amdgpu-target option has been deprecated and will be removed in the future.  Use --offload-arch instead.
Warning: The --amdgpu-target option has been deprecated and will be removed in the future.  Use --offload-arch instead.
Warning: The --amdgpu-target option has been deprecated and will be removed in the future.  Use --offload-arch instead.
Warning: The --amdgpu-target option has been deprecated and will be removed in the future.  Use --offload-arch instead.
Warning: The --amdgpu-target option has been deprecated and will be removed in the future.  Use --offload-arch instead.
In file included from /opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.hip:7:
/opt/conda/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/conversion_utils_hip.h:269:12: error: use of undeclared identifier '__double2half'; did you mean '__double2hiint'?
    return __double2half(val);
           ^~~~~~~~~~~~~
           __double2hiint
/opt/rocm-5.4.0/include/hip/amd_detail/amd_device_functions.h:440:30: note: '__double2hiint' declared here
__device__ static inline int __double2hiint(double x) {
                             ^
1 error generated when compiling for gfx1030.

@bmedishe
Copy link
Contributor Author

@bmedishe bmedishe marked this pull request as draft April 14, 2023 17:23
@bmedishe bmedishe marked this pull request as ready for review April 14, 2023 17:47
@bmedishe bmedishe changed the title temporary WAR workaround till __double2half support enabled in HIP temporary workaround till __double2half support enabled in HIP Apr 14, 2023
@bmedishe bmedishe changed the title temporary workaround till __double2half support enabled in HIP [ROCm] temporary workaround till __double2half support enabled in HIP Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants