Skip to content

Commit

Permalink
llama-rs -> llm, update README
Browse files Browse the repository at this point in the history
  • Loading branch information
philpax committed May 1, 2023
1 parent 3be6df4 commit 30f2094
Show file tree
Hide file tree
Showing 9 changed files with 42 additions and 52 deletions.
8 changes: 4 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,14 @@ RUN apk add --no-cache musl-dev
WORKDIR /app
COPY ./ /app
# do a release build
RUN cargo build --release --bin llama-cli
RUN strip target/release/llama-cli
RUN cargo build --release --bin llm
RUN strip target/release/llm

# use a plain alpine image, the alpine version needs to match the builder
FROM alpine:3.17
# if needed, install additional dependencies here
RUN apk add --no-cache libgcc
# copy the binary into the final image
COPY --from=builder /app/target/release/llama-cli .
COPY --from=builder /app/target/release/llm .
# set the binary as entrypoint
ENTRYPOINT ["/llama-cli"]
ENTRYPOINT ["/llm"]
2 changes: 1 addition & 1 deletion LICENSE-APACHE
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.

Copyright 2023 The llama-rs Authors
Copyright 2023 The llm Authors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion LICENSE-MIT
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2023 The llama-rs Authors
Copyright (c) 2023 The llm Authors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
72 changes: 31 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,13 @@
# LLaMA-rs
# llm

This project is a Rust port of
[llama.cpp](https://github.com/ggerganov/llama.cpp) 🦙🦀🚀
`llm` is a Rust ecosystem of libraries and CLI application for running inference on large language models, inspired by [llama.cpp](https://github.com/ggerganov/llama.cpp).

Just like its C++ counterpart, it is powered by the
[`ggml`](https://github.com/ggerganov/ggml) tensor library, which allows running
inference for Facebook's [LLaMA](https://github.com/facebookresearch/llama)
model on a CPU with good performance using full precision, f16 or 4-bit
quantized versions of the model.
The primary crate is the `llm` crate, which wraps `llm-base` and supported model crates.

[![Latest version](https://img.shields.io/crates/v/llama-rs.svg)](https://crates.io/crates/llama_rs)
It is powered by the [`ggml`](https://github.com/ggerganov/ggml) tensor library, and aims to bring
the robustness and ease of use of Rust to the world of large language models.

[![Latest version](https://img.shields.io/crates/v/llm.svg)](https://crates.io/crates/llm)
![MIT/Apache2](https://shields.io/badge/license-MIT%2FApache--2.0-blue)
[![Discord](https://img.shields.io/discord/1085885067601137734)](https://discord.gg/YB9WaXYAWU)

Expand All @@ -21,11 +19,9 @@ quantized versions of the model.

Make sure you have a Rust 1.65.0 or above and C toolchain[^1] set up.

`llm-base`, and the model crates (e.g. `bloom`, `gpt2` `llama`) are Rust
libraries, while `llm-cli` is a CLI applications that wraps the models and offer
basic inference capabilities.
`llm` is a Rust library that re-exports `llm-base` and the model crates (e.g. `bloom`, `gpt2` `llama`).

The following instructions explain how to build CLI applications.
`llm-cli` (binary name `llm`) is a basic application that provides a CLI interface to the library.

**NOTE**: For best results, make sure to build and run in release mode.
Debug builds are going to be very slow.
Expand All @@ -35,22 +31,22 @@ Debug builds are going to be very slow.
Run

```shell
cargo install --git https://github.com/rustformers/llama-rs llm
cargo install --git https://github.com/rustformers/llm llm
```

to install `llm` to your Cargo `bin` directory, which `rustup` is likely to
have added to your `PATH`.

The CLI application can then be run through `llm`.

![Gif showcasing language generation using llama-rs](./doc/resources/llama_gif.gif)
![Gif showcasing language generation using llm](./doc/resources/llama_gif.gif)

### Building from repository

Clone the repository and then build it with

```shell
git clone --recurse-submodules git@github.com:rustformers/llama-rs.git
git clone --recurse-submodules git@github.com:rustformers/llm.git
cargo build --release
```

Expand All @@ -64,23 +60,22 @@ cargo run --release --bin llm -- <ARGS>

This is useful for development.

### Getting LLaMA weights
### Getting model weights

In order to run inference, a model's weights are required. Currently, the
following models are supported:

In order to run the inference code in `llama-rs`, a copy of the model's weights
are required.
- LLaMA
- GPT-2
- BLOOM (partial support, results inconsistent)

#### From Hugging Face

Compatible weights - not necessarily the original LLaMA weights - can be found
on [Hugging Face by searching for GGML](https://huggingface.co/models?search=ggml).
At present, LLaMA-architecture models are supported.
Compatible weights can be found on [Hugging Face by searching for GGML models](https://huggingface.co/models?search=ggml).

#### LLaMA original weights

Currently, the only legal source to get the original weights is [this
repository](https://github.com/facebookresearch/llama/blob/main/README.md#llama).
Note that the choice of words also may or may not hint at the existence of other
kinds of sources.
Currently, the only legal source to get the original weights is [this repository](https://github.com/facebookresearch/llama/blob/main/README.md#llama).

After acquiring the weights, it is necessary to convert them into a format that
is compatible with ggml. To achieve this, follow the steps outlined below:
Expand All @@ -95,41 +90,36 @@ is compatible with ggml. To achieve this, follow the steps outlined below:
python3 scripts/convert-pth-to-ggml.py /path/to/your/models/7B/ 1

# Quantize the model to 4-bit ggml format
cargo run -p llama-cli quantize /path/to/your/models/7B/ggml-model-f16.bin /path/to/your/models/7B/ggml-model-q4_0.bin q4_0
cargo run --bin llm llama quantize /path/to/your/models/7B/ggml-model-f16.bin /path/to/your/models/7B/ggml-model-q4_0.bin q4_0
```

> **Note**
>
> The [llama.cpp repository](https://github.com/ggerganov/llama.cpp) has
> additional information on how to obtain and run specific models.
### BLOOM
#### BLOOM

The open-source [BLOOM](https://bigscience.huggingface.co/blog/bloom) model is
also supported.

[More information](https://huggingface.co/docs/transformers/model_doc/bloom)
about BLOOM is available on HuggingFace, as are some
[quantized models](https://huggingface.co/models?search=bloom%20ggml).

### GPT2
#### GPT-2

OpenAI's [GPT-2](https://jalammar.github.io/illustrated-gpt2/) architecture is
also supported. The open-source family of
[Cerebras](https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models/)
models is built on this architecture.

_Support for other open source models is currently planned. For models where
weights can be legally distributed, this section will be updated with scripts to
make the install process as user-friendly as possible. Due to the model's legal
requirements, this is currently not possible with LLaMA itself and a more
lengthy setup is required._

### Running

For example, try the following prompt:

```shell
llama-cli infer -m <path>/ggml-model-q4_0.bin -p "Tell me how cool the Rust programming language is:"
llm llama infer -m <path>/ggml-model-q4_0.bin -p "Tell me how cool the Rust programming language is:"
```

Some additional things to try:
Expand All @@ -139,7 +129,7 @@ Some additional things to try:
try `repl` mode!

```shell
llama-cli repl -m <path>/ggml-alpaca-7b-q4.bin -f examples/alpaca_prompt.txt
llm llama repl -m <path>/ggml-alpaca-7b-q4.bin -f examples/alpaca_prompt.txt
```

![Gif showcasing alpaca repl mode](./doc/resources/alpaca_repl_screencap.gif)
Expand All @@ -160,13 +150,13 @@ Some additional things to try:

```shell
# To build (This will take some time, go grab some coffee):
docker build -t llama-rs .
docker build -t llm .

# To run with prompt:
docker run --rm --name llama-rs -it -v ${PWD}/data:/data -v ${PWD}/examples:/examples llama-rs infer -m data/gpt4all-lora-quantized-ggml.bin -p "Tell me how cool the Rust programming language is:"
docker run --rm --name llm -it -v ${PWD}/data:/data -v ${PWD}/examples:/examples llm llama infer -m data/gpt4all-lora-quantized-ggml.bin -p "Tell me how cool the Rust programming language is:"

# To run with prompt file and repl (will wait for user input):
docker run --rm --name llama-rs -it -v ${PWD}/data:/data -v ${PWD}/examples:/examples llama-rs repl -m data/gpt4all-lora-quantized-ggml.bin -f examples/alpaca_prompt.txt
docker run --rm --name llm -it -v ${PWD}/data:/data -v ${PWD}/examples:/examples llm llama repl -m data/gpt4all-lora-quantized-ggml.bin -f examples/alpaca_prompt.txt
```

## Q&A
Expand Down Expand Up @@ -212,7 +202,7 @@ outside of `ggml`. This was done for a variety of reasons:
authors. Additionally, we can benefit from the larger Rust ecosystem with
ease.
- We would like to make `ggml` an optional backend
(see [this issue](https://github.com/rustformers/llama-rs/issues/31)).
(see [this issue](https://github.com/rustformers/llm/issues/31)).

In general, we hope to build a solution for model inferencing that is as easy
to use and deploy as any other Rust crate.
2 changes: 1 addition & 1 deletion binaries/llm-cli/src/cli_args.rs
Original file line number Diff line number Diff line change
Expand Up @@ -267,7 +267,7 @@ pub struct ModelLoad {
/// or use a model that was trained with a larger context size.
///
/// Alternate methods to extend the context, including
/// [context clearing](https://github.com/rustformers/llama-rs/issues/77) are
/// [context clearing](https://github.com/rustformers/llm/issues/77) are
/// being investigated, but are not yet implemented. Additionally, these
/// will likely not perform as well as a model with a larger context size.
#[arg(long, default_value_t = 2048)]
Expand Down
2 changes: 1 addition & 1 deletion crates/ggml/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
//! `ggml` is a semi-idiomatic wrapper for the `ggml` C library.
//!
//! It exposes a subset of operations (currently used to implement the [llama-rs](https://crates.io/crates/llama-rs) library).
//! It exposes a subset of operations (currently used to implement the [llm](https://crates.io/crates/llm) library).
//! Note that it does not expose a fully-idiomatic safe Rust interface; operations that could be potentially unsafe are marked as such.
//!
//! `ggml` operates on a computational graph; no values will be computed until [Context::graph_compute] is executed.
Expand Down
2 changes: 1 addition & 1 deletion crates/llm-base/src/loader.rs
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ pub enum LoadError {
magic: u32,
},
#[error("invalid file format version {version}")]
/// The version of the format is not supported by this version of `llama-rs`.
/// The version of the format is not supported by this version of `llm`.
InvalidFormatVersion {
/// The format that was encountered.
container_type: ContainerType,
Expand Down
2 changes: 1 addition & 1 deletion crates/models/llama/src/convert.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
//! full conversion.
use llm_base::FileType;
///
/// For reference, see [the PR](https://github.com/rustformers/llama-rs/pull/83).
/// For reference, see [the PR](https://github.com/rustformers/llm/pull/83).
use rust_tokenizers::preprocessing::vocab::sentencepiece_proto::sentencepiece_model::ModelProto;
use serde::Deserialize;
use std::{
Expand Down
2 changes: 1 addition & 1 deletion crates/models/llama/src/old_loader.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
//! Plan is to use this to create a tool that can convert multipart models
//! to single-part models for use with the new loader.
//!
//! <https://github.com/rustformers/llama-rs/issues/150>
//! <https://github.com/rustformers/llm/issues/150>
use std::{
collections::HashMap,
Expand Down

0 comments on commit 30f2094

Please sign in to comment.