Contributors Guide

The purpose of this document is to make it easy for open-source community members to contribute to this project. We'd love to discuss your contributions with you via a GitHub Issue or Discussion, or on Discord!

Checking Changes

This project uses a GitHub workflow to enforce code standards.

The rusty-hook project is used to run a similar set of checks automatically before committing. If you would like to run these checks locally, use cargo run -p precommit-check.

Regenerating GGML Bindings

Follow these steps to update the GGML submodule and regenerate the Rust bindings (this is only necessary if your changes depend on new GGML features):

git submodule update --remote
cargo run --release --package generate-ggml-bindings

Acceleration Support for Building

The ggml-sys crate includes various acceleration backends, selectable via --features flags. The availability of supported backends varies by platform, and ggml-sys can only be built with a single active acceleration backend at a time. If cublas and clblast are both specified, cublas is prioritized and clblast is ignored.

Platform/OS	`cublas`	`clblast`	`metal`
Windows	✔️	✔️	❌
Linux	✔️	✔️	❌
MacOS	❌	❌	✔️

Dependencies for Building with Acceleration Support

Windows

CuBLAS

CUDA must be installed. You can download CUDA from the official Nvidia site.

CLBlast

CLBlast can be installed via vcpkg using the command vcpkg install clblast. After installation, the OPENCL_PATH and CLBLAST_PATH environment variables should be set to the opencl_x64-windows and clblast_x64-windows directories respectively.

Here's an example of the required commands:

git clone https://github.com/Microsoft/vcpkg.git
.\vcpkg\bootstrap-vcpkg.bat
.\vcpkg\vcpkg install clblast
set OPENCL_PATH=....\vcpkg\packages\opencl_x64-windows
set CLBLAST_PATH=....\vcpkg\packages\clblast_x64-windows

⚠️ When working with MSVC in a Windows environment, it is essential to set the -Ctarget-feature=+crt-static Rust flag. This flag is critical as it enables the static linking of the C runtime, which can be paramount for certain deployment scenarios or specific runtime environments.

To set this flag, you can modify the .cargo\config file in your project directory. Please add the following configuration snippet:

[target.x86_64-pc-windows-msvc]
rustflags = ["-Ctarget-feature=+crt-static"]

This will ensure the Rust flag is appropriately set for your compilation process.

For a comprehensive guide on the usage of Rust flags, including other possible ways to set them, please refer to this detailed StackOverflow discussion. Make sure to choose an option that best fits your project requirements and development environment.

⚠️ For llm to function properly, it requires the clblast.dll and OpenCL.dll files. These files can be found within the bin subdirectory of their respective vcpkg packages. There are two options to ensure llm can access these files:

Amend your PATH environment variable to include the bin directories of each respective package.
Manually copy the clblast.dll and OpenCL.dll files into the ./target/release or ./target/debug directories. The destination directory will depend on the profile that was active during the compilation process.

Please choose the option that best suits your needs and environment configuration.

Linux

CuBLAS

You need to have CUDA installed on your system. CUDA can be downloaded and installed from the official Nvidia site. On Linux distributions that do not have CUDA_PATH set, the environment variables CUDA_INCLUDE_PATH and CUDA_LIB_PATH can be set to their corresponding paths.

CLBlast

CLBlast can be installed on Linux through various package managers. For example, using apt you can install it via sudo apt install clblast. After installation, make sure that the OPENCL_PATH and CLBLAST_PATH environment variables are correctly set. Additionally the environment variables OPENCL_INCLUDE_PATH/OPENCL_LIB_PATH & CBLAST_INCLUDE_PATH/CLBLAST_LIB_PATH can be used to specify the location of the files. All environment variables are supported by all listed operating systems.

MacOS

Metal

Xcode and the associated command-line tools should be installed on your system, and you should be running a version of MacOS that supports Metal. For more detailed information, please consult the official Metal documentation.

To enable Metal using the CLI, ensure it was built successfully using --features=metal and then pass the --use-gpu flag.

The current underlying implementation of Metal in GGML is still in flux and has some limitations:

Evaluating a model with more than one token at a time is not currently supported in GGML's Metal implementation. An llm inference session will fall back to the CPU implementation (typically during the 'feed prompt' phase) but will automatically use the GPU once a single token is passed per evaluation (typically after prompt feeding).
Not all model architectures will be equally stable when used with Metal due to ongoing work in the underlying implementation. Expect llama models to work fine though.
With Metal, it is possible but not required to use mmap. As buffers do not need to be copied to VRAM on M1, mmap is the most efficient however.
Debug messages may be logged by the underlying GGML Metal implementation. This will likely go away in the future for release builds of llm.

Debugging

This repository includes a launch.json file that can be used for debugging with Visual Studio Code - this file will need to be updated to reflect where models are stored on your system. Debugging with Visual Studio Code requires a language extension that depends on your operating system. Keep in mind that debugging text generation is extremely slow, but debugging model loading is not.

LLM References

Here are some tried-and-true references for learning more about large language models:

The Illustrated GPT-2 - an excellent technical description of how this seminal language model generates text
Andrej Karpathy's "Neural Networks: Zero to Hero" - a series of in-depth YouTube videos that guide the viewer through creating a neural network, a large language model, and a fully functioning chatbot, from scratch (in Python)
rustygrad - a native Rust implementation of Andrej Karpathy's micrograd
Understanding Deep Learning (Chapter 12 specifically)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CONTRIBUTING.md

CONTRIBUTING.md

Contributors Guide

Checking Changes

Regenerating GGML Bindings

Acceleration Support for Building

Dependencies for Building with Acceleration Support

Windows

CuBLAS

CLBlast

Linux

CuBLAS

CLBlast

MacOS

Metal

Debugging

LLM References

Files

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributors Guide

Checking Changes

Regenerating GGML Bindings

Acceleration Support for Building

Dependencies for Building with Acceleration Support

Windows

CuBLAS

CLBlast

Linux

CuBLAS

CLBlast

MacOS

Metal

Debugging

LLM References