Release 0.43.0 with the new build + pkg mechanism #1094

Titus-von-Koeller · 2024-02-27T19:45:30Z

Titus-von-Koeller
Feb 27, 2024
Maintainer

Hey @rickardp @wkpark @akx @matthewdouglas @younesbelkada,

First of all, thank you for all the amazing work you've been doing in the last weeks and months. I'm still positively shocked by the amount of community contributions we're getting in BNB and all of you are at the absolute forefront of this work! I have huge gratitude and respect for the work and expertise that you contribute towards helping us furthering the democratization of ML! Together with you the tasks and goals we're facing seem much more achievable and it's super motivating to work together like this.

Given that the build infra has completely changed with the switch to CMake and the new workflows that would even enable uploading to PyPi directly, I thought it would be good to create this place of discussion in order to prepare for the upcoming release, informally chat about we think is relevant and still be transparent in our process.

It would be wonderful to get some more input on what you think still needs doing for a full release with the new mechanisms and what PRs we should aim to still to merge until then.

My idea would be to aim toward releasing to test PyPi by the end of the week and if everything goes according to plan aim for a normal release sometime next week.

Below I added some notes on what I'm currently aware of and would be happy to hear your comments on in case you have something to add or a differing opinion:

merged:

to look into + merge to get ready for release process testing:

Migrate build data to pyproject.toml #1078 (get rid of requirements.txts? lock-file? but this can be left for later PR; is everyone agreeing this can be merged?)
Dynamic cuda wrapper #1065
Define CUDA versions for workflows centrally #1052 (use Python script instead of bash + thorough review, testing)
Define CUDA versions for workflows centrally #1052

any objections to this PR or what would be better? I would like to avoid senseless compute usage wherever easily possible, as a waste and bad for the environment:

ENH: Run the CMake CI only when relevant files are modified #1015

I'm completely unsure what to do about this one, as there's some overlap with other PRs:

CI: fix workflows #1035

Dependabot: Does it make sense to keep this activated as long as we don't have a test suite to automatically validate against? How does this fit together with #1078, where version constraints are completely out of the picture (I guess they could be handled in pyproject.toml / I wonder what would be best practice in such cases)?

It would be nice to get the setup improvements merged: Do you agree that this is something I should prioritize?

Rework CUDA/native-library setup and diagnostics #1041

I would propose to hold off with the formatting PR #1081 until after the release (please feel free to comment if you think it's better to go ahead with it right way). I am afraid to create a lot of extra work in all the open PRs and we should focus on merging as much as we can before going for a final round of testing the release process and bitsandbytes itself.

Titus-von-Koeller · 2024-02-27T19:51:10Z

Titus-von-Koeller
Feb 27, 2024
Maintainer Author

Ok, seems that through my merges just now I broke the Python packaging workflow, great start 😅

Gotta call it a day now though, will look into it in the next days. Noone is consuming these yet, so I don't think it's a huge issue at this very moment. Let me know if you disagree.

3 replies

rickardp Feb 27, 2024

Ok, seems that through my merges just now I broke the Python packaging workflow, great start 😅

Gotta call it a day now though, will look into it in the next days. Noone is consuming these yet, so I don't think it's a huge issue at this very moment. Let me know if you disagree.

I think possibly Ubuntu broke it, not you. Possibly an apt-get update would fix that

matthewdouglas Feb 27, 2024
Maintainer

@rickardp I've started to put together a PR for that.

https://github.com/matthewdouglas/bitsandbytes/actions/runs/8071496161

Titus-von-Koeller Feb 28, 2024
Maintainer Author

perfect, thanks a lot for the quick fix!

rickardp · 2024-02-27T21:27:47Z

rickardp
Feb 27, 2024

Regarding dependabot. I find it useful to keep dependency versions explicit in the code but automate their updating. This makes it easy to detect if an upstream dependency broke the build.

Note that even today we get some value out of the workflow as most dependencies are used compile time and none are packaged anyway (except for the CUDA compiler and such affecting build output).

I would actually advocate keeping version constraints although using a more modern approach than what is currently used. It also serves as documentation for what to use when building locally.

1 reply

matthewdouglas Feb 28, 2024
Maintainer

I've added more constraints on #1078.

matthewdouglas · 2024-02-28T20:04:32Z

matthewdouglas
Feb 28, 2024
Maintainer

I think without #1065 there would need to be builds set up for more versions of the CUDA toolkit to combine into the wheels. I've mentioned this a little bit here: #1032 (comment)

In general, the deploy.sh script used to build the wheel in the past needs to be replicated. It's currently building for a pretty large matrix: 11.0, 11.1, 11.4, 11.5, 11.7, 11.8, 12.0, 12.1, 12.2, and 12.3.

On Windows we don't have to worry about breaking backwards compatibility, so no concern there. For Linux x86-64 it's not so clear for me, except that I think that it should be slimmed down.

5 replies

akx Mar 4, 2024

According to the docs CUDA versions should be compatible within the same minor version since 11.1+.

With a quick cursory look (off the .so built on main):

$ nm -g libbitsandbytes_cuda121.so | grep .so | cut -d @ -f 2 | sort | uniq
CXXABI_1.3
libcublas.so.12
libcublasLt.so.12
libcudart.so.12
libcusparse.so.12

the CUDA libraries aren't minor-versioned either so I think we could just build for one minor version per major version? Someone with more CUDA-capable hardware than what I have should need to figure out if that's the case 😅

matthewdouglas Mar 5, 2024
Maintainer

I do want to try that out. I don't suspect there's any new features in the later 11.x that are being used.

So far I haven't had much issue on Windows when linking against CUDA 12.3 and running with torch==2.2.1+cu121. But I have not gone so far in depth to actually look at what's being loaded at runtime.

akx Mar 6, 2024

By the way, on Windows it looks like Llama.cpp is shipping binaries compiled only against CUDA 11.7.1 and 12.2.0 – maybe that's enough for us too?

matthewdouglas Mar 6, 2024
Maintainer

@akx One of the differences there is there isn't a whole host of other CUDA libraries already loaded like there would be for us from PyTorch. The cudart download is a large one that ships cublas64_11.dll, cublasLt64_11.dll, and cudart64_110.dll which are the required dependencies.

For us we've got the same dependencies. Ideally we take them from the PyTorch install (it would be good to set a reasonable RPATH on these on Linux, something I'll probably put up a PR for). But depending on the priority of paths we search, we can end up loading from a CUDA Toolkit install or not finding the libraries used by PyTorch at all. And that's why we've got the LD_LIBRARY_PATH part of the instructions too.

PyTorch's __init__.py does a few interesting things related to this:

On Windows, os.add_dll_directory() is used. By default the place to look is the lib path where PyTorch is installed. The Windows wheels seem to place the DLLs here. A few additional search paths are added: {sys.exec_prefix}/Library/bin along with {sys.base_exec_prefix}/Library/bin. If those don't yield results for cudart64*.dll then an additional search path is considered by looking for the CUDA Toolkit's env var, i.e. CUDA_PATH_V12_1. That one is specific to the CUDA version that PyTorch is built against.

Linux is a little different with the deps on packages like nvidia-*-cu12 but all of the libtorch*.so libraries seem to have a good RPATH set to find them.

With that aside I am planning to test this out. I think things should still work and we may be able to get away with a single 11.x and 12.x build.

Titus-von-Koeller Mar 6, 2024
Maintainer Author

btw, we're discussing in Slack how to potentially release BNB v0.43 asap with temporary CUDA minor version support, i.e. cu117, cu118, cu121, cu122, cu123, because of the imminent FSDP+QLoRA blog post. The new approach with support only for major CUDA versions wouldn't be quick enough for that, as I would like this new approach thoroughly vetted before we release in that way.

Titus-von-Koeller · 2024-04-02T13:24:09Z

Titus-von-Koeller
Apr 2, 2024
Maintainer Author

After upgrading from v0.42 to v0.43, when using 4bit quantization, models may generate slightly different outputs (approximately up to the 2nd decimal place) due to a fix in the code. I just updated the CHANGELOG to reflect this. For anyone interested in the details:

We removed a .half() call that was in place before the quantization step. The half() call was left over unintentionally from a prior stage of Tim's implementation and the #970 PR authors noticed this and fixed it as they went along. This line was changed in comparison to the previous code to no longer call half(). In the past, bfloat16 and float32 were not yet supported in the CUDA kernel doing the quantization.

For compatibility reasons, when feeding data into the kernel, everything defaulted to half. However, now, float32 and bfloat16 are supported and that's why the call to .half() is not needed anymore. One can see that, now, other dtypes are handled correctly here. However, due to these changes in precision upgrading BNB to this version or beyond can lead to slight changes in model output, where we found the below to be true relative to the prior approach when doing .half() before the quantization:

Greatest absolute difference: 0.013289213180541992
Greatest relative difference: 81.5882339477539

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 0.43.0 with the new build + pkg mechanism #1094

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 9 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Release 0.43.0 with the new build + pkg mechanism #1094

Titus-von-Koeller Feb 27, 2024 Maintainer

Replies: 4 comments · 9 replies

Titus-von-Koeller Feb 27, 2024 Maintainer Author

rickardp Feb 27, 2024

matthewdouglas Feb 27, 2024 Maintainer

Titus-von-Koeller Feb 28, 2024 Maintainer Author

rickardp Feb 27, 2024

matthewdouglas Feb 28, 2024 Maintainer

matthewdouglas Feb 28, 2024 Maintainer

akx Mar 4, 2024

matthewdouglas Mar 5, 2024 Maintainer

akx Mar 6, 2024

matthewdouglas Mar 6, 2024 Maintainer

Titus-von-Koeller Mar 6, 2024 Maintainer Author

Titus-von-Koeller Apr 2, 2024 Maintainer Author

Titus-von-Koeller
Feb 27, 2024
Maintainer

Replies: 4 comments 9 replies

Titus-von-Koeller
Feb 27, 2024
Maintainer Author

matthewdouglas Feb 27, 2024
Maintainer

Titus-von-Koeller Feb 28, 2024
Maintainer Author

rickardp
Feb 27, 2024

matthewdouglas Feb 28, 2024
Maintainer

matthewdouglas
Feb 28, 2024
Maintainer

matthewdouglas Mar 5, 2024
Maintainer

matthewdouglas Mar 6, 2024
Maintainer

Titus-von-Koeller Mar 6, 2024
Maintainer Author

Titus-von-Koeller
Apr 2, 2024
Maintainer Author