Skip to content

Conversation

@althonos
Copy link

@althonos althonos commented May 27, 2024

Hi again!

Since the best way to install cython-blis is to compile it from source to take advantage of the machine architecture. In the case of our HPC cluster, I end up re-installing cython-blis on each node executor at the start of each job to make sure I'm using optimized code, but this takes a bit of time.

Given that BLIS has a lot of source files, the build process can be parallelized easily. I just changed the logic of the ExtensionBuilder.compile_objects code to actually invoke the compiler to build objects in parallel with a ThreadPool, based on the parallel flag of the command line (which is a default build_ext option), or using the MAX_JOBS environment variable (similar to what torch and flash-attn are doing).

By default, I left the job count to 1, so that parallel compilation happens only if enabled. Using 4 threads, the compilation is about twice faster:

MAX_JOBS="4" pip install blis --no-binary=blis 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant