Skip to content

RuntimeError: Error building extension 'cpu_adam' #2682

Closed
@arain60gb

Description

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[INFO|trainer.py:414] 2023-01-09 19:06:58,180 >> Using amp fp16 backend
[2023-01-09 19:06:58,187] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.7, git-hash=unknown, git-branch=unknown
[2023-01-09 19:06:58,191] [WARNING] [config_utils.py:67:process_deprecated_field] Config parameter cpu_offload is deprecated use offload_optimizer instead
[2023-01-09 19:07:05,242] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root...
Creating extension directory /root/.cache/torch_extensions/py38_cu117/cpu_adam...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py38_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[1/3] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/THC -isystem /root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -L/root/anaconda3/envs/bitten/lib64 -lcudart -lcublas -g -march=native -fopenmp -DAVX256 -c /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o
FAILED: cpu_adam.o
c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/THC -isystem /root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -L/root/anaconda3/envs/bitten/lib64 -lcudart -lcublas -g -march=native -fopenmp -DAVX256 -c /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o
In file included from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/context.h:11:0,
from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/custom_cuda_layers.h:16,
from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/cpu_adam.h:11,
from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp:1:
/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/gemm_test.h:6:10: fatal error: cuda_profiler_api.h: No such file or directory
#include <cuda_profiler_api.h>
^~~~~~~~~
compilation terminated.
[2/3] /root/anaconda3/envs/bitten/bin/nvcc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/THC -isystem /root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -D_CUDA_NO_HALF_CONVERSIONS -D_CUDA_NO_BFLOAT16_CONVERSIONS
-D_CUDA_NO_HALF2_OPERATORS
--expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U_CUDA_NO_HALF_OPERATORS
-U_CUDA_NO_HALF_CONVERSIONS_ -U_CUDA_NO_HALF2_OPERATORS_ -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -c /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o
FAILED: custom_cuda_kernel.cuda.o
/root/anaconda3/envs/bitten/bin/nvcc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/include/THC -isystem /root/anaconda3/envs/bitten/include -isystem /root/anaconda3/envs/bitten/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -D_CUDA_NO_HALF_CONVERSIONS -D_CUDA_NO_BFLOAT16_CONVERSIONS_ -D_CUDA_NO_HALF2_OPERATORS_ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U_CUDA_NO_HALF_OPERATORS_ -U_CUDA_NO_HALF_CONVERSIONS_ -U_CUDA_NO_HALF2_OPERATORS_ -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -c /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o
In file included from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/context.h:11:0,
from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/custom_cuda_layers.h:16,
from /root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu:1:
/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/csrc/includes/gemm_test.h:6:10: fatal error: cuda_profiler_api.h: No such file or directory
#include <cuda_profiler_api.h>
^~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build
subprocess.run(
File "/root/anaconda3/envs/bitten/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "run_clm.py", line 478, in
main()
File "run_clm.py", line 441, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/transformers/trainer.py", line 1112, in train
deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/transformers/deepspeed.py", line 355, in deepspeed_init
model, optimizer, _, lr_scheduler = deepspeed.initialize(
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/init.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 330, in init
self._configure_optimizer(optimizer, model_parameters)
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1195, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1266, in _configure_basic_optimizer
optimizer = DeepSpeedCPUAdam(model_parameters,
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in init
self.ds_opt_adam = CPUAdamBuilder().load()
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 460, in load
return self.jit_load(verbose)
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 495, in jit_load
op_module = load(
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile
_write_ninja_file_and_build_library(
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'cpu_adam'
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7efc99bbd1f0>
Traceback (most recent call last):
File "/root/anaconda3/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 108, in del
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
[2023-01-09 19:07:12,824] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 7728
[2023-01-09 19:07:12,826] [ERROR] [launch.py:324:sigkill_handler]

Metadata

Assignees

Labels

bugSomething isn't workingtraining

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions