Rework CUDA/native-library setup and diagnostics #1041

akx · 2024-02-06T09:22:08Z

This PR reworks the native-library setup code that was partially enmeshed with the diagnostics code. I'm sorry it's all in one big commit, this was a bit too hard to pull into smaller ones so it still made sense 😓

The main thing is that cuda_setup/ is gone; in its place are

cuda_specs, which is a simple dataclass containing information about the current cuda environment, and the native-library loading code is now in cextension.py. (This would be further reworked when more backends are introduced, of course; see Enable common device abstraction for 8bits/4bits #898.)
- cextension introduces a minimal proxy object for the ctypes.CDLL (a precursor to a Backend), so a global CUDA-specific constant COMPILED_WITH_CUDA is not needed.
diagnostics/, which is a separate thing that's being run by python -m bitsandbytes.
- There was overlapping DLL/.so discovering code in both cuda_setup and __main__.py; I got rid of the latter implementation in favor of the stuff that had already been there in cuda_setup. As far as I could tell, though, that code was only ever used to sniff around for CUDA runtime libraries, but that information was only shown to the user as a diagnostic.
  - Finding the libraries was made a bit looser; anything nvcuda or libcudart now smells like CUDA.
- I think the new diagnostics are pretty much on par with the old ones – definitely not bit-by-bit exact though. I don't presently have a machine with broken CUDA, so I can't really tell. (I think the formatting should be made more GitHub friendly though; the ######## headers turn into monstrous Markdown headings. That's for another PR, though.)

Additionally, this PR trivially enables the Mac libbitsandbytes_cpu.dylib native library to be found by adding it to the DYNAMIC_LIBRARY_SUFFIX mapping.

I tested locally that:

on my Mac, no tests fail and the library gets loaded as expected
on my WSL2 machine, libbitsandbytes_cuda121_nocublaslt.so gets loaded as expected and tests pass as before.

This is still in no way perfect, but it's better IMO :)

Refs #918

github-actions · 2024-02-06T09:25:33Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

rickardp · 2024-02-06T15:33:42Z

Really nice refactoring, much easier to follow the code. I would argue the setup code should embrace further device abstraction, but I think this is being worked on already and this looks like a really nice improvement!

younesbelkada

Thanks a lot for this huge work and refactor ! 🙏
I quickly tried this branch out, and I managed to compile bnb on your branch, but when calling python -m bitsandbytes I get:

Could not find the bitsandbytes CUDA binary at PosixPath('/bitsandbytes/bitsandbytes/libbitsandbytes_cuda121.so')
Could not load bitsandbytes native library: /bitsandbytes/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "/bitsandbytes/bitsandbytes/cextension.py", line 110, in <module>
    lib = get_native_library()
  File "/bitsandbytes/bitsandbytes/cextension.py", line 97, in get_native_library
    dll = ct.cdll.LoadLibrary(str(binary_path))
  File "/opt/conda/envs/peft/lib/python3.8/ctypes/__init__.py", line 451, in LoadLibrary
    return self._dlltype(name)
  File "/opt/conda/envs/peft/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /bitsandbytes/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory

CUDA Setup failed despite CUDA being available. Please run the following command to get more information:

python -m bitsandbytes

Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
CUDA specs: CUDASpecs(highest_compute_capability=(8, 0), cuda_version_string='121', cuda_version_tuple=(12, 1))
PyTorch settings found: CUDA_VERSION=121, Highest Compute Capability: (8, 0).
Could not find the bitsandbytes CUDA binary at PosixPath('/bitsandbytes/bitsandbytes/libbitsandbytes_cuda121.so')
Traceback (most recent call last):
  File "/opt/conda/envs/peft/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/envs/peft/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/bitsandbytes/bitsandbytes/__main__.py", line 4, in <module>
    main()
  File "/bitsandbytes/bitsandbytes/diagnostics/main.py", line 44, in main
    print_cuda_diagnostics(cuda_specs)
  File "/bitsandbytes/bitsandbytes/diagnostics/cuda.py", line 112, in print_cuda_diagnostics
    if not binary_path.exists():
AttributeError: 'NoneType' object has no attribute 'exists'

Any idea what I did wrong and how to fix this?

younesbelkada

Below will solve the error but I am still not able to compile and use bnb on a docker image

bitsandbytes/diagnostics/cuda.py

younesbelkada · 2024-02-06T18:44:11Z

@akx managed to make it work following instructions from :#1042

akx · 2024-02-07T09:31:40Z

@younesbelkada Rebased and fixed your excellent catch :)

Funnily enough this still works fine on my machine even if the diagnostics say

CUDA SETUP: WARNING! CUDA runtime files not found in any environmental path.

since what really matters for linkage (on Linux) is the ldconfig cache:

(venv) akx@AEON ~/bitsandbytes (cuda-wagh)> ldconfig -p | grep libcudart
	libcudart.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.12
	libcudart.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so

– I think the env path diagnostics should probably be reworked too.

Titus-von-Koeller · 2024-02-14T19:16:31Z

Ok, this is looking really good so far. We're planning to merge #898 after a final review from Tim and me this end of week. After that I would like to merge this one here and #1060.

Also, I see a bunch of PRs from @rickardp: Any particular order you want these merged/ looked at? Please make me aware of any dependencies/ conflicts between all mentioned PRs, if you're aware of them or if you think anything should come first.

Thanks again all for the valuable work your contributing :)

rickardp · 2024-02-14T20:41:59Z

The most important one of my PRs right now is #1050 but no one should block this one. I try to stay clear of the Python code as I know there's a lot going on there right now

akx · 2024-03-04T13:29:37Z

Rebased, incorporating #1064.

akx · 2024-03-05T18:22:34Z

So yeah, @Titus-von-Koeller, this is again mergeable 😅

akx · 2024-03-08T09:48:01Z

Works on my Windows machine too (though the output really does need some cleaning, as noted in the PR description).

(a) F:\temp\a>uv pip list
Package           Version
----------------- ------------
bitsandbytes      0.44.0.dev0
filelock          3.9.0
fsspec            2023.4.0
jinja2            3.1.2
markupsafe        2.1.3
mpmath            1.3.0
networkx          3.2.1
numpy             1.26.3
pillow            10.2.0
sympy             1.12
torch             2.2.1+cu121
torchaudio        2.2.1+cu121
torchvision       0.17.1+cu121
typing-extensions 4.8.0

(a) F:\temp\a>python -m bitsandbytes
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
CUDA specs: CUDASpecs(highest_compute_capability=(6, 1), cuda_version_string='121', cuda_version_tuple=(12, 1))
PyTorch settings found: CUDA_VERSION=121, Highest Compute Capability: (6, 1).
To manually override the PyTorch CUDA version please see: https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx
WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
If you run into issues with 8-bit matmul, you can try 4-bit quantization:
https://huggingface.co/blog/4bit-transformers-bitsandbytes
The directory listed in your path is found to be non-existent: ...
Found duplicate CUDA runtime files (see below).

We select the PyTorch default CUDA runtime, which is 12.1,
but this might mismatch with the CUDA version that is needed for bitsandbytes.
To override this behavior set the `BNB_CUDA_VERSION=<version string, e.g. 122>` environmental variable.

For example, if you want to use the CUDA version 122,
    BNB_CUDA_VERSION=122 python ...

OR set the environmental variable in your .bashrc:
    export BNB_CUDA_VERSION=122

In the case of a manual override, make sure you set LD_LIBRARY_PATH, e.g.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2,
* Found CUDA runtime at: c:\windows\system32\nvcuda.dll
* Found CUDA runtime at: c:\windows\system32\nvcudadebugger.dll
* Found CUDA runtime at: C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\cudart64_65.dll
* Found CUDA runtime at: C:\WINDOWS\system32\nvcuda.dll
* Found CUDA runtime at: C:\WINDOWS\system32\nvcudadebugger.dll
* Found CUDA runtime at: C:\Windows\System32\nvcuda.dll
* Found CUDA runtime at: C:\Windows\System32\nvcudadebugger.dll
* Found CUDA runtime at: C:\WINDOWS\system32\nvcuda.dll
* Found CUDA runtime at: C:\WINDOWS\system32\nvcudadebugger.dll
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and CUDA is callable...
SUCCESS!
Installation was successful!

Titus-von-Koeller · 2024-03-13T10:25:59Z

bitsandbytes/consts.py

+from pathlib import Path
+import platform
+
+DYNAMIC_LIBRARY_SUFFIX = {
+    "Darwin": ".dylib",
+    "Linux": ".so",
+    "Windows": ".dll",
+}.get(platform.system(), ".so")
+
+PACKAGE_DIR = Path(__file__).parent
+PACKAGE_GITHUB_URL = "https://github.com/TimDettmers/bitsandbytes"
+NONPYTORCH_DOC_URL = "https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx"


much cleaner like this, nice! also especially like the .get(platform.system(), ".so")

Titus-von-Koeller · 2024-03-13T10:29:06Z

bitsandbytes/cuda_specs.py

+import torch
+
+
+@dataclasses.dataclass(frozen=True)


also, super nice and clean, the whole module! Really like how you factor out everything, much easier to follow and maintain now

Titus-von-Koeller · 2024-03-13T10:37:51Z

bitsandbytes/cextension.py

+    return PACKAGE_DIR / library_name
+
+
+class BNBNativeLibrary:


this refactor here, I also find very useful, much more pythonic and maintainable. thanks!

of course there's still a lot of work, but this is a significant step forward

Titus-von-Koeller · 2024-03-13T11:04:51Z

Ok, I just went through everything one more time and it's looking great, as always.

Thanks for providing tweaks for the things that came up in yesterdays review and that we discussed in Slack this morning.

I ran the test suite and despite the usual flakiness, there's only one issue:

――――――――――――――――――――――――――――――――――― test_manual_override ――――――――――――――――――――――――――――――――――――

requires_cuda = True

    def test_manual_override(requires_cuda):
        manual_cuda_path = str(Path('/mmfs1/home/dettmers/data/local/cuda-12.2'))
    
        pytorch_version = torch.version.cuda.replace('.', '')
    
        assert pytorch_version != 122  # TODO: this will never be true...
    
        os.environ['CUDA_HOME']='{manual_cuda_path}'
        os.environ['BNB_CUDA_VERSION']='122'
        #assert str(manual_cuda_path) in os.environ['LD_LIBRARY_PATH']
        import bitsandbytes as bnb
>       loaded_lib = bnb.cuda_setup.main.CUDASetup.get_instance().binary_name
E       AttributeError: module 'bitsandbytes' has no attribute 'cuda_setup'

tests/test_cuda_setup_evaluator.py:20: AttributeError

The Transformer integration tests also came through clean, not that I was expecting an issue, but just ran them to be sure.

I'm not sure to what extent this test is actually useful. Wdyt? We could also delete it, in case it doesn't.

Anyways, maybe that's another thing to discuss in dev corner, to what extent we would like test coverage for this setup stuff and to what extent that is even practically testable and if this is useful or not.

Otherwise, this is ready to merge. Great work! Really glad to have you on board at BNB ❤️

akx · 2024-03-13T11:26:27Z

@Titus-von-Koeller Ah, good catch, yeah. Fixing that test actually uncovered a bug regarding the override mechanism (it was using an unused binary_name) :)

Titus-von-Koeller · 2024-03-13T12:48:03Z

Well, good that we caught it! Nice additions to the tests as well.

Thanks again for the great work on this cleanup!

Glad to merge.

- This PR removes $CUDA_HOME/lib from $LD_LIBRARY_PATH because bitsandbytes can follow PyTorch's CUDA runtime since bitsandbytes-foundation/bitsandbytes#1041, and setting $LD_LIBRARY_PATH for `bitsandbytes` is not necessary any more - Hard-coded `/run/opengl-driver/lib` are removed because nix-gl-host can print the driver path for both Ubuntu and NixOS since numtide/nix-gl-host#16 - `link-nvidia-drivers.nix` is removed because drivers are instead copied to a cache directory by nix-gl-host

akx force-pushed the cuda-wagh branch from a30d041 to 03e9d5d Compare February 6, 2024 09:29

younesbelkada reviewed Feb 6, 2024

View reviewed changes

bitsandbytes/diagnostics/cuda.py Show resolved Hide resolved

akx force-pushed the cuda-wagh branch 2 times, most recently from 116c910 to 42edea0 Compare February 7, 2024 09:06

akx requested a review from younesbelkada February 7, 2024 11:14

younesbelkada mentioned this pull request Feb 8, 2024

Enable common device abstraction for 8bits/4bits #898

Merged

akx mentioned this pull request Feb 9, 2024

Check LD_LIBRARY_PATH on setup #580

Open

akx mentioned this pull request Feb 15, 2024

Dynamic cuda wrapper #1065

Closed

3 tasks

akx force-pushed the cuda-wagh branch from 42edea0 to 780980e Compare March 4, 2024 13:29

akx mentioned this pull request Mar 7, 2024

bitsandbytes can't handle multiple path locations in LD_LIBRARY_PATH #1112

Open

akx force-pushed the cuda-wagh branch 2 times, most recently from 174f575 to 20ed75b Compare March 8, 2024 09:02

akx mentioned this pull request Mar 8, 2024

Polish GitHub workflows #1116

Merged

Titus-von-Koeller self-assigned this Mar 12, 2024

matthewdouglas mentioned this pull request Mar 13, 2024

Linux binaries should ship with appropriate RPATH #1126

Open

akx force-pushed the cuda-wagh branch 3 times, most recently from b76c8cb to e213618 Compare March 13, 2024 10:00

Titus-von-Koeller reviewed Mar 13, 2024

View reviewed changes

akx and others added 3 commits March 13, 2024 13:26

Rework CUDA setup and diagnostics

e2db55e

Sanity check: Add check for lib being None

6a5a18a

Improve filtering for values that are surely not paths

79d1ccc

akx force-pushed the cuda-wagh branch from e213618 to 79d1ccc Compare March 13, 2024 11:26

akx requested a review from Titus-von-Koeller March 13, 2024 11:27

Titus-von-Koeller merged commit fd723b7 into bitsandbytes-foundation:main Mar 13, 2024
23 checks passed

Titus-von-Koeller mentioned this pull request Mar 15, 2024

[RFC] cross-platform: Refactoring bitsandbytes/cuda_setup #918

Closed

matthewdouglas mentioned this pull request Mar 18, 2024

Fix diagnostics error within vscode on windows #1134

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework CUDA/native-library setup and diagnostics #1041

Rework CUDA/native-library setup and diagnostics #1041

akx commented Feb 6, 2024 •

edited

Loading

github-actions bot commented Feb 6, 2024

rickardp commented Feb 6, 2024

younesbelkada left a comment

younesbelkada left a comment

younesbelkada commented Feb 6, 2024

akx commented Feb 7, 2024

Titus-von-Koeller commented Feb 14, 2024

rickardp commented Feb 14, 2024

akx commented Mar 4, 2024

akx commented Mar 5, 2024

akx commented Mar 8, 2024

Titus-von-Koeller Mar 13, 2024

Titus-von-Koeller Mar 13, 2024

Titus-von-Koeller Mar 13, 2024

Titus-von-Koeller Mar 13, 2024

Titus-von-Koeller commented Mar 13, 2024

akx commented Mar 13, 2024 •

edited

Loading

Titus-von-Koeller commented Mar 13, 2024

Rework CUDA/native-library setup and diagnostics #1041

Rework CUDA/native-library setup and diagnostics #1041

Conversation

akx commented Feb 6, 2024 • edited Loading

github-actions bot commented Feb 6, 2024

rickardp commented Feb 6, 2024

younesbelkada left a comment

Choose a reason for hiding this comment

younesbelkada left a comment

Choose a reason for hiding this comment

younesbelkada commented Feb 6, 2024

akx commented Feb 7, 2024

Titus-von-Koeller commented Feb 14, 2024

rickardp commented Feb 14, 2024

akx commented Mar 4, 2024

akx commented Mar 5, 2024

akx commented Mar 8, 2024

Titus-von-Koeller Mar 13, 2024

Choose a reason for hiding this comment

Titus-von-Koeller Mar 13, 2024

Choose a reason for hiding this comment

Titus-von-Koeller Mar 13, 2024

Choose a reason for hiding this comment

Titus-von-Koeller Mar 13, 2024

Choose a reason for hiding this comment

Titus-von-Koeller commented Mar 13, 2024

akx commented Mar 13, 2024 • edited Loading

Titus-von-Koeller commented Mar 13, 2024

akx commented Feb 6, 2024 •

edited

Loading

akx commented Mar 13, 2024 •

edited

Loading