Skip to content

Commit

Permalink
Prettier
Browse files Browse the repository at this point in the history
`npx prettier --write --prose-wrap always README.md`
  • Loading branch information
danforbes committed May 5, 2023
1 parent 711b8b9 commit ddde528
Showing 1 changed file with 55 additions and 46 deletions.
101 changes: 55 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,23 +8,28 @@
![MIT/Apache2](https://shields.io/badge/license-MIT%2FApache--2.0-blue)
[![Discord](https://img.shields.io/discord/1085885067601137734)](https://discord.gg/YB9WaXYAWU)

`llm` is a Rust ecosystem of libraries for running inference on large language models, inspired by [llama.cpp](https://github.com/ggerganov/llama.cpp).
`llm` is a Rust ecosystem of libraries for running inference on large language
models, inspired by [llama.cpp](https://github.com/ggerganov/llama.cpp).

The primary crate is the `llm` crate, which wraps `llm-base` and supported model crates. This is used by `llm-cli` to provide inference for all supported models.
The primary crate is the `llm` crate, which wraps `llm-base` and supported model
crates. This is used by `llm-cli` to provide inference for all supported models.

It is powered by the [`ggml`](https://github.com/ggerganov/ggml) tensor library, and aims to bring
the robustness and ease of use of Rust to the world of large language models.
It is powered by the [`ggml`](https://github.com/ggerganov/ggml) tensor library,
and aims to bring the robustness and ease of use of Rust to the world of large
language models.

## Getting started

Make sure you have a Rust 1.65.0 or above and C toolchain[^1] set up.

`llm` is a Rust library that re-exports `llm-base` and the model crates (e.g. `bloom`, `gpt2` `llama`).
`llm` is a Rust library that re-exports `llm-base` and the model crates (e.g.
`bloom`, `gpt2` `llama`).

`llm-cli` (binary name `llm`) is a basic application that provides a CLI interface to the library.
`llm-cli` (binary name `llm`) is a basic application that provides a CLI
interface to the library.

**NOTE**: For best results, make sure to build and run in release mode.
Debug builds are going to be very slow.
**NOTE**: For best results, make sure to build and run in release mode. Debug
builds are going to be very slow.

### Building using `cargo`

Expand All @@ -34,8 +39,8 @@ Run
cargo install --git https://github.com/rustformers/llm llm-cli
```

to install `llm` to your Cargo `bin` directory, which `rustup` is likely to
have added to your `PATH`.
to install `llm` to your Cargo `bin` directory, which `rustup` is likely to have
added to your `PATH`.

The CLI application can then be run through `llm`.

Expand Down Expand Up @@ -68,7 +73,8 @@ GGML files are easy to acquire. Currently, the following models are supported:
- [GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)
- [LLaMA](https://huggingface.co/docs/transformers/model_doc/llama)
- [GPT-NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)
- [BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom) (partial support, results inconsistent)
- [BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom) (partial
support, results inconsistent)

Certain older GGML formats are not supported by this project, but the goal is to
maintain feature parity with the upstream GGML project. For problems relating to
Expand All @@ -78,18 +84,22 @@ loading models, or requesting support for

#### From Hugging Face

Hugging Face 🤗 is a leader in open-source machine learning and hosts hundreds of GGML models.
Hugging Face 🤗 is a leader in open-source machine learning and hosts hundreds
of GGML models.
[Search for GGML models on Hugging Face 🤗](https://huggingface.co/models?search=ggml).

#### r/LocalLLaMA

This Reddit community maintains [a wiki](https://www.reddit.com/r/LocalLLaMA/wiki/index/)
related to GGML models, including well organized lists of links for acquiring
[GGML models](https://www.reddit.com/r/LocalLLaMA/wiki/models/) (mostly from Hugging Face 🤗).
This Reddit community maintains
[a wiki](https://www.reddit.com/r/LocalLLaMA/wiki/index/) related to GGML
models, including well organized lists of links for acquiring
[GGML models](https://www.reddit.com/r/LocalLLaMA/wiki/models/) (mostly from
Hugging Face 🤗).

#### LLaMA original weights

Currently, the only legal source to get the original weights is [this repository](https://github.com/facebookresearch/llama/blob/main/README.md#llama).
Currently, the only legal source to get the original weights is
[this repository](https://github.com/facebookresearch/llama/blob/main/README.md#llama).

After acquiring the weights, it is necessary to convert them into a format that
is compatible with ggml. To achieve this, follow the steps outlined below:
Expand Down Expand Up @@ -133,12 +143,13 @@ Some additional things to try:
![Gif showcasing alpaca repl mode](./doc/resources/alpaca_repl_screencap.gif)

- Sessions can be loaded (`--load-session`) or saved (`--save-session`) to file.
To automatically load and save the same session, use `--persist-session`.
This can be used to cache prompts to reduce load time, too:
To automatically load and save the same session, use `--persist-session`. This
can be used to cache prompts to reduce load time, too:

![Gif showcasing prompt caching](./doc/resources/prompt_caching_screencap.gif)

(This GIF shows an older version of the flags, but the mechanics are still the same.)
(This GIF shows an older version of the flags, but the mechanics are still the
same.)

[^1]:
A modern-ish C toolchain is required to compile `ggml`. A C++ toolchain
Expand All @@ -161,46 +172,44 @@ docker run --rm --name llm -it -v ${PWD}/data:/data -v ${PWD}/examples:/examples

### Why did you do this?

It was not my choice. Ferris appeared to me in my dreams and asked me
to rewrite this in the name of the Holy crab.
It was not my choice. Ferris appeared to me in my dreams and asked me to rewrite
this in the name of the Holy crab.

### Seriously now.

Come on! I don't want to get into a flame war. You know how it goes,
_something something_ memory _something something_ cargo is nice, don't make
me say it, everybody knows this already.
Come on! I don't want to get into a flame war. You know how it goes, _something
something_ memory _something something_ cargo is nice, don't make me say it,
everybody knows this already.

### I insist.

_Sheesh! Okaaay_. After seeing the huge potential for **llama.cpp**,
the first thing I did was to see how hard would it be to turn it into a
library to embed in my projects. I started digging into the code, and realized
the heavy lifting is done by `ggml` (a C library, easy to bind to Rust) and
the whole project was just around ~2k lines of C++ code (not so easy to bind).
After a couple of (failed) attempts to build an HTTP server into the tool, I
realized I'd be much more productive if I just ported the code to Rust, where
I'm more comfortable.
_Sheesh! Okaaay_. After seeing the huge potential for **llama.cpp**, the first
thing I did was to see how hard would it be to turn it into a library to embed
in my projects. I started digging into the code, and realized the heavy lifting
is done by `ggml` (a C library, easy to bind to Rust) and the whole project was
just around ~2k lines of C++ code (not so easy to bind). After a couple of
(failed) attempts to build an HTTP server into the tool, I realized I'd be much
more productive if I just ported the code to Rust, where I'm more comfortable.

### Is this the real reason?

Haha. Of course _not_. I just like collecting imaginary internet
points, in the form of little stars, that people seem to give to me whenever I
embark on pointless quests for _rewriting X thing, but in Rust_.
Haha. Of course _not_. I just like collecting imaginary internet points, in the
form of little stars, that people seem to give to me whenever I embark on
pointless quests for _rewriting X thing, but in Rust_.

### How is this different from `llama.cpp`?

This is a reimplementation of `llama.cpp` that does not share any code with it
outside of `ggml`. This was done for a variety of reasons:

- `llama.cpp` requires a C++ compiler, which can cause problems for
cross-compilation to more esoteric platforms. An example of such a platform
is WebAssembly, which can require a non-standard compiler SDK.
- Rust is easier to work with from a development and open-source perspective;
it offers better tooling for writing "code in the large" with many other
authors. Additionally, we can benefit from the larger Rust ecosystem with
ease.
- We would like to make `ggml` an optional backend
(see [this issue](https://github.com/rustformers/llm/issues/31)).

In general, we hope to build a solution for model inferencing that is as easy
to use and deploy as any other Rust crate.
cross-compilation to more esoteric platforms. An example of such a platform is
WebAssembly, which can require a non-standard compiler SDK.
- Rust is easier to work with from a development and open-source perspective; it
offers better tooling for writing "code in the large" with many other authors.
Additionally, we can benefit from the larger Rust ecosystem with ease.
- We would like to make `ggml` an optional backend (see
[this issue](https://github.com/rustformers/llm/issues/31)).

In general, we hope to build a solution for model inferencing that is as easy to
use and deploy as any other Rust crate.

0 comments on commit ddde528

Please sign in to comment.