Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
lintian06 authored Aug 14, 2023
1 parent bcf3238 commit d02ef30
Showing 1 changed file with 2 additions and 3 deletions.
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ My goal of `llama2.rs` is to create a rust port for llama2.c,
primarily targeting at a cross-platform implementation for on-device inference.

### Highlights:
- Similar to `llama2.c` with openmp, `llama2.rs` also utilizes model parallelization. (*Benchmark: 27 -> 40, performance gain +46%*)
- Similar to `llama2.c` with openmp, `llama2.rs` also utilizes model parallelization. (*[Benchmark](https://github.com/lintian06/llama2.rs#performance-comparison): 27.6 -> 40.2 tokens/s on `stories110M.bin`, +46% speedup over llama2.c*)
- Utilize memory mapping for runtime memory reduction (with a flag `--is_mmap`). (*480MB -> 59MB, save up to 88% memory*)

### How to build and run inference.
Expand All @@ -33,7 +33,7 @@ cargo run --release -- --model_path=./stories15M.bin

See `cargo run --release -- --help` for the full help doc.

You can run unit test with the below command with `stories15M.bin` downloaded in advance.
You can run unit test with the below command and `stories15M.bin` downloaded in advance.

```bash
cargo test
Expand All @@ -59,7 +59,6 @@ and calculate the mean of standard deviation. Here is my spec:
- CC: Apple clang version 14.0.0.
- Rust: rustc 1.71.1.


| Experiments | #Token/s: mean (± std) |
|-------------------|----------------------------|
| llama2.rs | 40.228 (±1.691) |
Expand Down

0 comments on commit d02ef30

Please sign in to comment.