Skip to content
forked from rustformers/llm

An ecosystem of Rust libraries for working with large language models

License

Notifications You must be signed in to change notification settings

dchima/rust-llm

Repository files navigation

llm - Large Language Models for Everyone, in Rust

llm is an ecosystem of Rust libraries for working with large language models - its built on top of the fast, efficient GGML library for machine learning.

A llama riding a crab, AI-generated

Image by @darthdeus, using Stable Diffusion

Latest version MIT/Apache2 Discord

The primary entrypoint for developers is the llm crate, which wraps llm-base and the supported model crates.

For end-users, there is a CLI application, llm-cli, which provides a convenient interface for interacting with supported models. Text generation can be done as a one-off based on a prompt, or interactively, through REPL or chat modes. The CLI can also be used to serialize (print) decoded models, quantize GGML files, or compute the perplexity of a model. It can be downloaded from the latest GitHub release or by installing it from crates.io.

llm is powered by the ggml tensor library, and aims to bring the robustness and ease of use of Rust to the world of large language models. At present, inference is only on the CPU, but we hope to support GPU inference in the future through alternate backends.

Currently, the following models are supported:

Getting Started

This project depends on Rust v1.65.0 or above and a modern C toolchain.

The llm crate exports llm-base and the model crates (e.g. bloom, gpt2 llama).

To use llm, add it to your Cargo.toml:

[dependencies]
llm = "0.2"

NOTE: To improve debug performance, exclude llm from being built in debug mode:

[profile.dev.package.llm]
opt-level = 3

Building llm-cli

Follow these steps to build the command line application, which is named llm:

Using cargo

To install llm to your Cargo bin directory, which rustup is likely to have added to your PATH, run:

cargo install llm-cli

The CLI application can then be run through llm.

From Source

Clone the repository and then build it with

git clone --recurse-submodules git@github.com:rustformers/llm.git
cargo build --release

The resulting binary will be at target/release/llm[.exe].

It can also be run directly through Cargo, with

cargo run --release -- $ARGS

Getting Models

GGML files are easy to acquire. For a list of models that have been tested, see the known-good models.

Certain older GGML formats are not supported by this project, but the goal is to maintain feature parity with the upstream GGML project. For problems relating to loading models, or requesting support for supported GGML model types, please open an Issue.

From Hugging Face

Hugging Face 🤗 is a leader in open-source machine learning and hosts hundreds of GGML models. Search for GGML models on Hugging Face 🤗.

r/LocalLLaMA

This Reddit community maintains a wiki related to GGML models, including well organized lists of links for acquiring GGML models (mostly from Hugging Face 🤗).

Running

Once the llm executable has been built or is in a $PATH directory, try running it. Here's an example that uses the open-source GPT4All language model:

llm llama infer -m ggml-gpt4all-j-v1.3-groovy.bin -p "Rust is a cool programming language because"

For more information about the llm CLI, use the --help parameter.

There is also a simple inference example that is helpful for debugging:

cargo run --release --example inference llama ggml-gpt4all-j-v1.3-groovy.bin $OPTIONAL_PROMPT

Working with Raw Models

Python v3.9 or v3.10 is needed to convert a raw model to a GGML-compatible format (note that Python v3.11 is not supported):

python3 util/convert-pth-to-ggml.py $MODEL_HOME/$MODEL/7B/ 1

The output of the above command can be used by llm to create a quantized model:

cargo run --release llama quantize $MODEL_HOME/$MODEL/7B/ggml-model-f16.bin $MODEL_HOME/$MODEL/7B/ggml-model-q4_0.bin q4_0

In future, we hope to provide a more streamlined way of converting models.

Note

The llama.cpp repository has additional information on how to obtain and run specific models.

Q&A

Does the llm CLI support chat mode?

Yes, but certain fine-tuned models (e.g. Alpaca, Vicuna, Pygmalion) are more more suited to chat use-cases than so-called "base models". Here's an example of using the llm CLI in REPL (Read-Evaluate-Print Loop) mode with an Alpaca model - note that the provided prompt format is tailored to the model that is being used:

llm llama repl -m ggml-alpaca-7b-q4.bin -f examples/alpaca_prompt.txt

There is also a Vicuna chat example that demonstrates how to create a custom chatbot:

cargo run --release --example vicuna-chat llama ggml-vicuna-7b-q4.bin

Can llm sessions be persisted for later use?

Sessions can be loaded (--load-session) or saved (--save-session) to file. To automatically load and save the same session, use --persist-session. This can be used to cache prompts to reduce load time, too.

Do you provide support for Docker and NixOS?

The llm Dockerfile is in the util directory, as is a Flake manifest and lockfile.

Do you accept contributions?

Absolutely! Please see the contributing guide.

What applications and libraries use llm?

Applications

  • llmcord: Discord bot for generating messages using llm.
  • local.ai: Desktop app for hosting an inference API on your local machine using llm.

Libraries

  • llm-chain: Build chains in large language models for text summarization and completion of more complex tasks

About

An ecosystem of Rust libraries for working with large language models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 99.7%
  • Other 0.3%