Skip to content

[BUG] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! #6643

Open
@RickoNoNo3

Description

Describe the bug
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

As is well known, under DeepSpeed Zero3 mode, for non-quantized models, running 'accelerate launch my_sft.py' will first load the model onto the CPU and then move it to multiple GPUs by sharding a single model. However, for quantized models, when loading, the number of GPUs you have will determine the number of model replicas that are loaded, that is automatically loaded onto the GPU, unexpected model sharding (parallel) even occurs. This leads to subsequent errors.

In my case, when loading LLaMa3.1-8B on two 4090s with bnb 4bit quantization, one model was loaded onto card 0, while the other appeared to be partially on card 0 and partially on card 1.

To Reproduce
Steps to reproduce the behavior:

  1. prepare 2x4090 GPUs
  2. Load LLaMa3.1-8B model using AutoModelForCaulsalLM.from_pretrained with param quantization_config=BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, llm_int8_enable_fp32_cpu_offload=True, ...)
  3. Construct a SFTTrainer with any arguments, then trainer.train()
  4. See error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

Expected behavior
Compatible with quantized models

ds_report output

[2024-10-20 17:13:48,941] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
fp_quantizer ........... [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: warning: libstdc++.so.6, needed by /usr/local/cuda-11.8/lib64/libcufile.so, not found (try using -rpath or -rpath-link)
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: warning: libm.so.6, needed by /usr/local/cuda-11.8/lib64/libcufile.so, not found (try using -rpath or -rpath-link)
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::runtime_error::~runtime_error()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `__gxx_personality_v0@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::ostream::tellp()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::substr(unsigned long, unsigned long) const@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::_M_replace_aux(unsigned long, unsigned long, unsigned long, char)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `dlopen'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `typeinfo for bool@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::__throw_logic_error(char const*)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `VTT for std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::locale::~locale()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `__cxa_end_catch@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `VTT for std::basic_ofstream<char, std::char_traits<char> >@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `VTT for std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::_M_stringbuf_init(std::_Ios_Openmode)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `operator new[](unsigned long)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::_M_leak_hard()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `vtable for std::basic_ifstream<char, std::char_traits<char> >@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::append(char const*, unsigned long)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `typeinfo for unsigned short@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::resize(unsigned long, char)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::str(std::string const&)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `typeinfo for char const*@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::ctype<char>::_M_widen_init() const@GLIBCXX_3.4.11'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::__throw_invalid_argument(char const*)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::_Rb_tree_decrement(std::_Rb_tree_node_base const*)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `__cxa_free_exception@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::ios_base::Init::~Init()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `__cxa_pure_virtual@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::ostream::flush()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `vtable for __cxxabiv1::__class_type_info@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `__cxa_rethrow@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::_Rep::_M_dispose(std::allocator<char> const&)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `vtable for std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_fstream<char, std::char_traits<char> >::~basic_fstream()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::compare(char const*) const@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::locale::locale()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::chrono::_V2::system_clock::now()@GLIBCXX_3.4.19'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `VTT for std::basic_ifstream<char, std::char_traits<char> >@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::_Hash_bytes(void const*, unsigned long, unsigned long)@CXXABI_1.3.5'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::ostream& std::ostream::_M_insert<long long>(long long)@GLIBCXX_3.4.9'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `typeinfo for char*@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const@GLIBCXX_3.4.18'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `dlclose'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::ostream& std::ostream::_M_insert<unsigned long>(unsigned long)@GLIBCXX_3.4.9'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::_Rb_tree_increment(std::_Rb_tree_node_base const*)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::ios_base::~ios_base()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::__basic_file<char>::~__basic_file()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `__cxa_guard_acquire@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::ostream& std::ostream::_M_insert<bool>(bool)@GLIBCXX_3.4.9'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `VTT for std::basic_fstream<char, std::char_traits<char> >@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `vtable for std::basic_ios<char, std::char_traits<char> >@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `vtable for std::basic_filebuf<char, std::char_traits<char> >@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `operator delete[](void*)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `vtable for std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::assign(char const*)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(unsigned long, char, std::allocator<char> const&)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::__detail::_List_node_base::_M_transfer(std::__detail::_List_node_base*, std::__detail::_List_node_base*)@GLIBCXX_3.4.15'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `typeinfo for std::exception@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::istream& std::istream::_M_extract<double>(double&)@GLIBCXX_3.4.9'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_filebuf<char, std::char_traits<char> >::close()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `vtable for std::basic_fstream<char, std::char_traits<char> >@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_ifstream<char, std::char_traits<char> >::basic_ifstream(char const*, std::_Ios_Openmode)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::append(std::string const&)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `operator new(unsigned long)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(std::_Ios_Openmode)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `typeinfo for unsigned int@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::append(char const*)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::find(char, unsigned long) const@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::ostream::put(char)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `typeinfo for int@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::__throw_bad_alloc()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `__cxa_thread_atexit@CXXABI_1.3.7'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::_Rb_tree_increment(std::_Rb_tree_node_base*)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_ifstream<char, std::char_traits<char> >::~basic_ifstream()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::ios_base::Init::Init()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::__detail::_List_node_base::swap(std::__detail::_List_node_base&, std::__detail::_List_node_base&)@GLIBCXX_3.4.15'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::istream::getline(char*, long)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_filebuf<char, std::char_traits<char> >::basic_filebuf()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `VTT for std::basic_istringstream<char, std::char_traits<char>, std::allocator<char> >@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::cerr@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::find(char const*, unsigned long, unsigned long) const@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `vtable for std::basic_istringstream<char, std::char_traits<char>, std::allocator<char> >@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::str() const@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `typeinfo for void*@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::assign(std::string const&)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_ostringstream()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::_Rb_tree_rebalance_for_erase(std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `typeinfo for unsigned long@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)@GLIBCXX_3.4.15'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::__detail::_List_node_base::_M_unhook()@GLIBCXX_3.4.15'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_istream<char, std::char_traits<char> >& std::getline<char, std::char_traits<char>, std::allocator<char> >(std::basic_istream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::_M_sync(char*, unsigned long, unsigned long)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_iostream<char, std::char_traits<char> >::~basic_iostream()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `log2f@GLIBC_2.2.5'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::ostream::operator<<(std::basic_streambuf<char, std::char_traits<char> >*)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::exception::~exception()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::__basic_file<char>::is_open() const@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::_M_mutate(unsigned long, unsigned long, unsigned long)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `dlerror'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::__detail::_Prime_rehash_policy::_M_next_bkt(unsigned long) const@GLIBCXX_3.4.18'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_istringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_istringstream()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::basic_ostringstream(std::_Ios_Openmode)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::swap(std::string&)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `vtable for std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_ios<char, std::char_traits<char> >::init(std::basic_streambuf<char, std::char_traits<char> >*)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `dlsym'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::__throw_bad_cast()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_ios<char, std::char_traits<char> >::clear(std::_Ios_Iostate)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `operator delete(void*)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::ostream::operator<<(int)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::_Rep::_S_empty_rep_storage@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::_Rep::_M_destroy(std::allocator<char> const&)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `vtable for std::basic_ofstream<char, std::char_traits<char> >@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_stringstream()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::end()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::ostream& std::ostream::_M_insert<long>(long)@GLIBCXX_3.4.9'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::istream::get()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `typeinfo for unsigned long long@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)@GLIBCXX_3.4.9'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::flush<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::cout@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::ostream& std::ostream::_M_insert<unsigned long long>(unsigned long long)@GLIBCXX_3.4.9'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::insert(unsigned long, char const*)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream(std::string const&, std::_Ios_Openmode)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::runtime_error::runtime_error(std::string const&)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::ostream& std::ostream::_M_insert<void const*>(void const*)@GLIBCXX_3.4.9'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `vtable for std::basic_streambuf<char, std::char_traits<char> >@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `__cxa_allocate_exception@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `typeinfo for void const*@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::reserve(unsigned long)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `__cxa_begin_catch@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `typeinfo for long@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::find(char const*, unsigned long) const@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_filebuf<char, std::char_traits<char> >::open(char const*, std::_Ios_Openmode)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::compare(std::string const&) const@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::istream::getline(char*, long, char)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_istream<char, std::char_traits<char> >& std::getline<char, std::char_traits<char>, std::allocator<char> >(std::basic_istream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, char)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::insert(unsigned long, char const*, unsigned long)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::string::assign(char const*, unsigned long)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `typeinfo for unsigned char@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::ios_base::ios_base()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::__throw_out_of_range(char const*)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::__throw_length_error(char const*)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::__throw_system_error(int)@GLIBCXX_3.4.11'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::ostream& std::ostream::_M_insert<double>(double)@GLIBCXX_3.4.9'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `typeinfo for long long@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, unsigned long, std::allocator<char> const&)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_ifstream<char, std::char_traits<char> >::close()@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `__cxa_guard_release@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `__cxa_throw@CXXABI_1.3'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::_Rb_tree_decrement(std::_Rb_tree_node_base*)@GLIBCXX_3.4'
/home/user/.conda/envs/unsloth_env/compiler_compat/ld: /usr/local/cuda-11.8/lib64/libcufile.so: undefined reference to `std::basic_filebuf<char, std::char_traits<char> >::~basic_filebuf()@GLIBCXX_3.4'
collect2: error: ld returned 1 exit status
gds .................... [NO] ....... [NO]
transformer_inference .. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4
 [WARNING]  using untested triton version (3.0.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/user/.conda/envs/unsloth_env/lib/python3.10/site-packages/torch']
torch version .................... 2.4.0+cu118
deepspeed install path ........... ['/home/user/.conda/envs/unsloth_env/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.15.2, unknown, unknown
torch cuda version ............... 11.8
torch hip version ................ None
nvcc version ..................... 11.8
deepspeed wheel compiled w. ...... torch 2.4, cuda 11.8
shared memory (/dev/shm) size .... 100.00 GB

System info (please complete the following information):

  • Ubuntu 18.04
  • 2x4090
  • CUDA 11.8

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtraining

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions