llama2.ts

Inference for Llama2-like Transformer models in one TypeScript file

Heavily based on the Andrej Karpathy's llama2.c.

Mostly of educational value (understand something by implementing it yourself! porting in this case but still :P)

Features

Binary compatible (i.e. should produce exactly the same outputs as the C version given the parameters and random seed)
Achieves around 70/25/10 tokens per second for the 15/45/110M models, respectively.
Can run the full 7B model with 0.16 tokens per second on my laptop o_O

Includes the TinyStories 15M model.

Usage

node (via the bundled t348):

node --experimental-loader=./t348.mjs llama2.ts stories15M.bin -s 1 -t 0 -i "Once upon a time"

bun:

bun llama2.ts stories15M.bin -i "Once upon a time"

Larger TinyStories models:

wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories42M.bin
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.bin

Arguments:

-i <string> - initial prompt
-t <float> - temperature (0..1, 0 = deterministic argmax)
-s <int> - random seed
-n <int> - number of tokens to generate (0..256, default 256)
-p <float> - p value for nucleus sampling, default 0.9

UPD: see also llama2.js by @epicure for a browser version. Glad I missed it before starting this project, otherwise I'd probably never start it :D

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

llama2.ts

Features

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

llama2.ts

Features

Usage