[RFC] cross-platform: Refactoring bitsandbytes/cuda_setup #918

Titus-von-Koeller · 2023-12-15T13:39:16Z

Summary

This RFC aims to discuss and gather community input on refactoring the bitsandbytes/cuda_setup module. The goal is to enhance its functionality, simplify the user experience across different hardware and operating systems, and prepare it for upcoming device support expansions.

Background

bitsandbytes has become instrumental in democratizing AI, thanks to its deep integration with hardware. Despite millions of monthly downloads, a fraction of users encounter issues, such as those detailed in #914. Our objective is to make bitsandbytes as easily usable (e.g. as easy as pip install bitsandbytes and load_in_4bit=True ) as possible while mostly hiding the complexities of the software-hardware boundary under the hood, maintaining the ease of installation and use, while improving error reporting and handling.

Current Challenges

Setup Module Issues

Bug Reports from python -m bitsandbytes: This feature, intended to simplify debugging, sometimes presents similar tracebacks for different underlying issues, causing confusion in the issue threads.
CUDA Install and Environment Challenges: Many problems arise not from bitsandbytes itself, but from user-side issues with CUDA installations, environment settings (e.g. LD_LIBRARY_PATH), or hardware configurations.
Perceived Reliability: There are some issues with the setup code that need be fixed. The code quality could be much better.

Diverse Hardware Landscape

GPU-Nvidia Variability: Different generations and capabilities (e.g., Compute Capability, tensor cores, data types).
Emerging GPU-AMD Support: Efforts like PR #756 are in progress to integrate AMD hardware.
Apple Silicon Requests: Interest shown in PR #257 and other contributions.
Intel GPU+CPU Quantization: The initiative by Intel with PR #898 for device abstraction.

Operating System Variability

Linux as Primary OS: Continued focus on Linux support.
Windows and Apple Support: Evaluating and integrating community contributions for broader OS support:
- Windows: Limited support atm, but high impact - need to evaluate the status quo, community contributions and come up with a roadmap
- Apple: Currently, no support - Ongoing discussions, community contributions that weren't accepted so far and quite a high interest - need to further evaluate the status quo, community contributions and come up with a roadmap.

Proposed Improvements

Refactoring cuda_setup: Enhancing code quality and clarity to better handle the diverse hardware and OS scenarios.
Error Reporting Enhancement: Develop a more nuanced error reporting mechanism that reflects the source of issues as accurate as possible, while making the conflation of issues harder by providing distinct traces.
Community Engagement: Actively seeking community input, especially for cross-platform compatibility and new device support.
CI/CD Strategies: Discussing and implementing robust CI/CD processes to facilitate testing across various platforms and hardware.

Call to Action

We invite the community to provide feedback and suggestions on the following:

Improvements to the cuda_setup module.
Strategies for handling diverse hardware and operating systems.
Ideas for an effective CI/CD setup - we'll provide a separate RFC for that, but feel free to mention initial thoughts here as well.
Any other relevant insights or experiences.

Timeline and Milestones

We'll take an incremental take on improving the setup module. The more actionable and commonly agreed, the quicker we can implement.

Contribution and Feedback Mechanism

Please share your thoughts, suggestions, and feedback in the thread below.

Summary and Next Steps

This RFC serves as a starting point to get feedback and coordinate the collaborative effort to refine bitsandbytes's setup process. We aim to address the current challenges, embrace the diversity of hardware and operating systems, and build a robust, user-friendly setup. Your participation in this process is crucial, and we look forward to your valuable input.

The text was updated successfully, but these errors were encountered:

wkpark · 2023-12-17T05:09:25Z

See also PR #873 (merged), PR #876 for windows and PR #908

jgong5 · 2023-12-23T08:18:36Z

CPU Quantization: The initiative by Intel with #898 for device abstraction.

Hey, one thing to clarify. The device abstraction proposed in #894 and the PR #898 are not only about CPU quantization but also for Intel GPU. Therefore, we tried to abstract out the interfaces that the backend needs to implement and also provided a registration mechanism for supporting new devices.

younesbelkada · 2023-12-23T09:13:13Z

Hi @jgong5 @wkpark
Are you connected through slack to discuss more details about bnb refactor and iterate quickly on this? Can I use your emails to send Slack invites?

TAJD · 2023-12-23T12:03:32Z

This is really exciting!

Ideas for an effective CI/CD setup - we'll provide a separate RFC for that, but feel free to mention initial thoughts here as well.

GPU runners now appear to be available on GitHub Actions and I can't think of a more useful project to begin trialling their usage.

There is a lot of complexity regarding the number of different CUDA versions, hardware and other dependencies. For now one goal could be to get the test suite automated and running cleanly for a limited combination of CUDA versions/hardwares and then building out the build/deployment process from there.

rickardp · 2023-12-30T22:10:45Z

Nice to see some initiative on this again. Here's a few points from me.

Blessing of the maintainer(s) that this is a truly portable library, and not just a CUDA library that can be made to run some MPS kernels with some effort. The alternative is to maintain a fork. This comment M1.M2 MacOS Users #485 (comment) seems to have indicated so, but it seems to have since died out. @Titus-von-Koeller do I understand you have interest in driving these changes to be mainlined in this repo?
See my comment here M1.M2 MacOS Users #485 (comment). Mainly, build in CMake, focus on getting everything to build on GitHub actions, test coverage. Get feature parity on CPU to be able to have test coverage on all hardware. Specifically, it would be possibly to make this library very easy to use ("pip install bitsandbytes") with binary wheels for all supported platforms (with some tradeoff in packaging multiple CUDA kernels for different supported CUDA versions).
2b) As OP indicated, CUDA setup code is quirky. I think it tries to solve some edge cases better solved by following standard installation processes (look att PyTorch). But I am no expert. In any case, I think this library should focus on the kernels and leave CUDA setup to other frameworks/libraries.
Then port kernel by kernel to hardware like MPS or Intel. Remember that contributors to these codebases will not be able to run CUDA kernels typically, so test coverage will be a challenge. And vice versa. Possibly the range of GitHub agents are now sufficient as indicated by @TAJD 's comment

As for some of the details / tech choices on portability, please see prior discussion in #252 and #485 as there was a lot of good discussions already.

@Titus-von-Koeller I understand that this issue might be specific to the cuda_setup, but I think that if the decision is made to build true binary wheels (see bullet 2/2b above), a lot of the complexity of the CUDA set up goes away. So my proposal would be start by agreeing that this library will be distributed this way, and load the CUDA kernels for the CUDA runtime supplied by PyTorch (querying PyTorch for the details, IIRC this is possibly but my knowledge here is 6 months old so I might be wrong here).

I'm happy to spend some time reviving/rebasing/refactoring any of the work on PR #257 that are of interest to the community, but I would like to get some commit from a maintainer that this actually has a chance of getting merged so it's not just a dead end.

Titus-von-Koeller · 2024-01-29T20:21:08Z

@TAJD

GPU runners now appear to be available on GitHub Actions and I can't think of a more useful project to begin trialling their usage.

There is a lot of complexity regarding the number of different CUDA versions, hardware and other dependencies. For now one goal could be to get the test suite automated and running cleanly for a limited combination of CUDA versions/hardwares and then building out the build/deployment process from there.

Yes, getting build + testing automated is quite high up our list.

I also saw that Github blog post about GPU runners (this is still in beta) and already signed up for the beta in December, but we didn't get selected. Atm, the only way to get GPU runners is to self-host them, which in our case would mean we would need to spin them up in the cloud on demand. However, we decided that the engineering effort to get that working is currently better targeted at more pressing / high impact matters. Hugging Face is willing to support us with compute costs, once we decide to move ahead with this. If anyone is willing to contribute / collaborate on this topic, please let me know and we can figure out how/when to move forward.

TAJD · 2024-01-30T09:26:15Z

@Titus-von-Koeller, I would like to find time to contribute. It's awesome that HF has capacity to support this!

Do let me know how the project would like to proceed - once there's a plan we can start to chip away at elements of it 🙂

akx · 2024-01-30T11:27:11Z

Ah, I wasn't even aware of this conversation before opening #996 :)

Titus-von-Koeller · 2024-03-15T15:50:50Z

Ok, after merging #1041 (thanks @akx, this is really bringing us a step forward!), we should re-asses where we would like to head with this.

Seems @matthewdouglas and @rickardp also had quite a few opinions on the topics. If everyone could just spell out a bit what they think is important going forward, this would be quite helpful in distilling things down to something concrete. Please let me know what you think.

Titus-von-Koeller · 2024-07-25T11:03:46Z

Archiving this, because it's out of date and we ended up favoring other modes of interaction to coordinate.

Titus-von-Koeller self-assigned this Dec 15, 2023

Titus-von-Koeller added high priority (first issues that will be worked on) help wanted Extra attention is needed CUDA Setup Windows waiting for info RFC request for comments on proposed library improvements labels Dec 15, 2023

Titus-von-Koeller changed the title ~~[RFC] Refactoring bitsandbytes/cuda_setup for Enhanced Cross-Platform and Device Support.~~ [RFC] Refactoring bitsandbytes/cuda_setup for enhanced cross-platform and device support. Dec 15, 2023

younesbelkada added the feature-request label Dec 18, 2023

Titus-von-Koeller added the cross-platform label Jan 26, 2024

Titus-von-Koeller changed the title ~~[RFC] Refactoring bitsandbytes/cuda_setup for enhanced cross-platform and device support.~~ [RFC] cross-platform: Refactoring bitsandbytes/cuda_setup Jan 26, 2024

Titus-von-Koeller pinned this issue Jan 26, 2024

Titus-von-Koeller mentioned this issue Jan 26, 2024

fix library detection #873

Merged

akx mentioned this issue Jan 30, 2024

Ruff fixes #984

Merged

Titus-von-Koeller mentioned this issue Jan 30, 2024

[RFC] Cross-Platform Refactor: Overview + Link Hub #997

Closed

akx mentioned this issue Feb 6, 2024

Rework CUDA/native-library setup and diagnostics #1041

Merged

Titus-von-Koeller mentioned this issue Mar 27, 2024

bitsandbytes searching *cuda*so only in /usr/local and does not support other paths #1140

Open

Titus-von-Koeller closed this as completed Jul 25, 2024

Titus-von-Koeller unpinned this issue Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] cross-platform: Refactoring bitsandbytes/cuda_setup #918

[RFC] cross-platform: Refactoring bitsandbytes/cuda_setup #918

Titus-von-Koeller commented Dec 15, 2023 •

edited

Loading

wkpark commented Dec 17, 2023 •

edited

Loading

jgong5 commented Dec 23, 2023

younesbelkada commented Dec 23, 2023 •

edited

Loading

TAJD commented Dec 23, 2023

rickardp commented Dec 30, 2023 •

edited

Loading

Titus-von-Koeller commented Jan 29, 2024

TAJD commented Jan 30, 2024

akx commented Jan 30, 2024

Titus-von-Koeller commented Mar 15, 2024

Titus-von-Koeller commented Jul 25, 2024

[RFC] cross-platform: Refactoring bitsandbytes/cuda_setup #918

[RFC] cross-platform: Refactoring bitsandbytes/cuda_setup #918

Comments

Titus-von-Koeller commented Dec 15, 2023 • edited Loading

Summary

Background

Current Challenges

Setup Module Issues

Diverse Hardware Landscape

Operating System Variability

Proposed Improvements

Call to Action

Timeline and Milestones

Contribution and Feedback Mechanism

Summary and Next Steps

wkpark commented Dec 17, 2023 • edited Loading

jgong5 commented Dec 23, 2023

younesbelkada commented Dec 23, 2023 • edited Loading

TAJD commented Dec 23, 2023

rickardp commented Dec 30, 2023 • edited Loading

Titus-von-Koeller commented Jan 29, 2024

TAJD commented Jan 30, 2024

akx commented Jan 30, 2024

Titus-von-Koeller commented Mar 15, 2024

Titus-von-Koeller commented Jul 25, 2024

Titus-von-Koeller commented Dec 15, 2023 •

edited

Loading

wkpark commented Dec 17, 2023 •

edited

Loading

younesbelkada commented Dec 23, 2023 •

edited

Loading

rickardp commented Dec 30, 2023 •

edited

Loading