Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] cross-platform: Refactoring bitsandbytes/cuda_setup #918

Closed
Titus-von-Koeller opened this issue Dec 15, 2023 · 10 comments
Closed

[RFC] cross-platform: Refactoring bitsandbytes/cuda_setup #918

Titus-von-Koeller opened this issue Dec 15, 2023 · 10 comments
Assignees
Labels
cross-platform CUDA Setup feature-request help wanted Extra attention is needed high priority (first issues that will be worked on) RFC request for comments on proposed library improvements waiting for info Windows

Comments

@Titus-von-Koeller
Copy link
Collaborator

Titus-von-Koeller commented Dec 15, 2023

Summary

This RFC aims to discuss and gather community input on refactoring the bitsandbytes/cuda_setup module. The goal is to enhance its functionality, simplify the user experience across different hardware and operating systems, and prepare it for upcoming device support expansions.

Background

bitsandbytes has become instrumental in democratizing AI, thanks to its deep integration with hardware. Despite millions of monthly downloads, a fraction of users encounter issues, such as those detailed in #914. Our objective is to make bitsandbytes as easily usable (e.g. as easy as pip install bitsandbytes and load_in_4bit=True ) as possible while mostly hiding the complexities of the software-hardware boundary under the hood, maintaining the ease of installation and use, while improving error reporting and handling.

Current Challenges

Setup Module Issues

  • Bug Reports from python -m bitsandbytes: This feature, intended to simplify debugging, sometimes presents similar tracebacks for different underlying issues, causing confusion in the issue threads.
  • CUDA Install and Environment Challenges: Many problems arise not from bitsandbytes itself, but from user-side issues with CUDA installations, environment settings (e.g. LD_LIBRARY_PATH), or hardware configurations.
  • Perceived Reliability: There are some issues with the setup code that need be fixed. The code quality could be much better.

Diverse Hardware Landscape

  • GPU-Nvidia Variability: Different generations and capabilities (e.g., Compute Capability, tensor cores, data types).
  • Emerging GPU-AMD Support: Efforts like PR #756 are in progress to integrate AMD hardware.
  • Apple Silicon Requests: Interest shown in PR #257 and other contributions.
  • Intel GPU+CPU Quantization: The initiative by Intel with PR #898 for device abstraction.

Operating System Variability

  • Linux as Primary OS: Continued focus on Linux support.
  • Windows and Apple Support: Evaluating and integrating community contributions for broader OS support:
    • Windows: Limited support atm, but high impact - need to evaluate the status quo, community contributions and come up with a roadmap
    • Apple: Currently, no support - Ongoing discussions, community contributions that weren't accepted so far and quite a high interest - need to further evaluate the status quo, community contributions and come up with a roadmap.

Proposed Improvements

  1. Refactoring cuda_setup: Enhancing code quality and clarity to better handle the diverse hardware and OS scenarios.
  2. Error Reporting Enhancement: Develop a more nuanced error reporting mechanism that reflects the source of issues as accurate as possible, while making the conflation of issues harder by providing distinct traces.
  3. Community Engagement: Actively seeking community input, especially for cross-platform compatibility and new device support.
  4. CI/CD Strategies: Discussing and implementing robust CI/CD processes to facilitate testing across various platforms and hardware.

Call to Action

We invite the community to provide feedback and suggestions on the following:

  • Improvements to the cuda_setup module.
  • Strategies for handling diverse hardware and operating systems.
  • Ideas for an effective CI/CD setup - we'll provide a separate RFC for that, but feel free to mention initial thoughts here as well.
  • Any other relevant insights or experiences.

Timeline and Milestones

We'll take an incremental take on improving the setup module. The more actionable and commonly agreed, the quicker we can implement.

Contribution and Feedback Mechanism

Please share your thoughts, suggestions, and feedback in the thread below.

Summary and Next Steps

This RFC serves as a starting point to get feedback and coordinate the collaborative effort to refine bitsandbytes's setup process. We aim to address the current challenges, embrace the diversity of hardware and operating systems, and build a robust, user-friendly setup. Your participation in this process is crucial, and we look forward to your valuable input.

@Titus-von-Koeller Titus-von-Koeller self-assigned this Dec 15, 2023
@Titus-von-Koeller Titus-von-Koeller added high priority (first issues that will be worked on) help wanted Extra attention is needed CUDA Setup Windows waiting for info RFC request for comments on proposed library improvements labels Dec 15, 2023
@Titus-von-Koeller Titus-von-Koeller changed the title [RFC] Refactoring bitsandbytes/cuda_setup for Enhanced Cross-Platform and Device Support. [RFC] Refactoring bitsandbytes/cuda_setup for enhanced cross-platform and device support. Dec 15, 2023
@wkpark
Copy link
Contributor

wkpark commented Dec 17, 2023

See also PR #873 (merged), PR #876 for windows and PR #908

@jgong5
Copy link

jgong5 commented Dec 23, 2023

CPU Quantization: The initiative by Intel with #898 for device abstraction.

Hey, one thing to clarify. The device abstraction proposed in #894 and the PR #898 are not only about CPU quantization but also for Intel GPU. Therefore, we tried to abstract out the interfaces that the backend needs to implement and also provided a registration mechanism for supporting new devices.

@younesbelkada
Copy link
Collaborator

younesbelkada commented Dec 23, 2023

Hi @jgong5 @wkpark
Are you connected through slack to discuss more details about bnb refactor and iterate quickly on this? Can I use your emails to send Slack invites?

@TAJD
Copy link

TAJD commented Dec 23, 2023

This is really exciting!

Ideas for an effective CI/CD setup - we'll provide a separate RFC for that, but feel free to mention initial thoughts here as well.

GPU runners now appear to be available on GitHub Actions and I can't think of a more useful project to begin trialling their usage.

There is a lot of complexity regarding the number of different CUDA versions, hardware and other dependencies. For now one goal could be to get the test suite automated and running cleanly for a limited combination of CUDA versions/hardwares and then building out the build/deployment process from there.

@rickardp
Copy link
Contributor

rickardp commented Dec 30, 2023

Nice to see some initiative on this again. Here's a few points from me.

  1. Blessing of the maintainer(s) that this is a truly portable library, and not just a CUDA library that can be made to run some MPS kernels with some effort. The alternative is to maintain a fork. This comment M1.M2 MacOS Users #485 (comment) seems to have indicated so, but it seems to have since died out. @Titus-von-Koeller do I understand you have interest in driving these changes to be mainlined in this repo?
  2. See my comment here M1.M2 MacOS Users #485 (comment). Mainly, build in CMake, focus on getting everything to build on GitHub actions, test coverage. Get feature parity on CPU to be able to have test coverage on all hardware. Specifically, it would be possibly to make this library very easy to use ("pip install bitsandbytes") with binary wheels for all supported platforms (with some tradeoff in packaging multiple CUDA kernels for different supported CUDA versions).
    2b) As OP indicated, CUDA setup code is quirky. I think it tries to solve some edge cases better solved by following standard installation processes (look att PyTorch). But I am no expert. In any case, I think this library should focus on the kernels and leave CUDA setup to other frameworks/libraries.
  3. Then port kernel by kernel to hardware like MPS or Intel. Remember that contributors to these codebases will not be able to run CUDA kernels typically, so test coverage will be a challenge. And vice versa. Possibly the range of GitHub agents are now sufficient as indicated by @TAJD 's comment

As for some of the details / tech choices on portability, please see prior discussion in #252 and #485 as there was a lot of good discussions already.

@Titus-von-Koeller I understand that this issue might be specific to the cuda_setup, but I think that if the decision is made to build true binary wheels (see bullet 2/2b above), a lot of the complexity of the CUDA set up goes away. So my proposal would be start by agreeing that this library will be distributed this way, and load the CUDA kernels for the CUDA runtime supplied by PyTorch (querying PyTorch for the details, IIRC this is possibly but my knowledge here is 6 months old so I might be wrong here).

I'm happy to spend some time reviving/rebasing/refactoring any of the work on PR #257 that are of interest to the community, but I would like to get some commit from a maintainer that this actually has a chance of getting merged so it's not just a dead end.

@Titus-von-Koeller Titus-von-Koeller changed the title [RFC] Refactoring bitsandbytes/cuda_setup for enhanced cross-platform and device support. [RFC] cross-platform: Refactoring bitsandbytes/cuda_setup Jan 26, 2024
@Titus-von-Koeller Titus-von-Koeller pinned this issue Jan 26, 2024
@Titus-von-Koeller
Copy link
Collaborator Author

@TAJD

GPU runners now appear to be available on GitHub Actions and I can't think of a more useful project to begin trialling their usage.

There is a lot of complexity regarding the number of different CUDA versions, hardware and other dependencies. For now one goal could be to get the test suite automated and running cleanly for a limited combination of CUDA versions/hardwares and then building out the build/deployment process from there.

Yes, getting build + testing automated is quite high up our list.

I also saw that Github blog post about GPU runners (this is still in beta) and already signed up for the beta in December, but we didn't get selected. Atm, the only way to get GPU runners is to self-host them, which in our case would mean we would need to spin them up in the cloud on demand. However, we decided that the engineering effort to get that working is currently better targeted at more pressing / high impact matters. Hugging Face is willing to support us with compute costs, once we decide to move ahead with this. If anyone is willing to contribute / collaborate on this topic, please let me know and we can figure out how/when to move forward.

@TAJD
Copy link

TAJD commented Jan 30, 2024

@Titus-von-Koeller, I would like to find time to contribute. It's awesome that HF has capacity to support this!

Do let me know how the project would like to proceed - once there's a plan we can start to chip away at elements of it 🙂

@akx
Copy link
Contributor

akx commented Jan 30, 2024

Ah, I wasn't even aware of this conversation before opening #996 :)

@Titus-von-Koeller
Copy link
Collaborator Author

Ok, after merging #1041 (thanks @akx, this is really bringing us a step forward!), we should re-asses where we would like to head with this.

Seems @matthewdouglas and @rickardp also had quite a few opinions on the topics. If everyone could just spell out a bit what they think is important going forward, this would be quite helpful in distilling things down to something concrete. Please let me know what you think.

@Titus-von-Koeller
Copy link
Collaborator Author

Archiving this, because it's out of date and we ended up favoring other modes of interaction to coordinate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cross-platform CUDA Setup feature-request help wanted Extra attention is needed high priority (first issues that will be worked on) RFC request for comments on proposed library improvements waiting for info Windows
Projects
None yet
Development

No branches or pull requests

7 participants