Install the latest WasmEdge with plugins:
For macOS (apple silicon)
# install WasmEdge-0.13.4 with wasi-nn-ggml plugin
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugin wasi_nn-ggml
# Assuming you use zsh (the default shell on macOS), run the following command to activate the environment
source $HOME/.zshenv
For Ubuntu (>= 20.04)
# install libopenblas-dev
apt update && apt install -y libopenblas-dev
# install WasmEdge-0.13.4 with wasi-nn-ggml plugin
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugin wasi_nn-ggml
# Assuming you use bash (the default shell on Ubuntu), run the following command to activate the environment
source $HOME/.bashrc
For General Linux
# install WasmEdge-0.13.4 with wasi-nn-ggml plugin
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugin wasi_nn-ggml
# Assuming you use bash (the default shell on Ubuntu), run the following command to activate the environment
source $HOME/.bashrc
Download the llama-chat.wasm
:
curl -LO https://github.com/second-state/llama-utils/raw/main/chat/llama-chat.wasm
Click here to see the download link and commands to run the model.
Execute the WASM with the wasmedge
using the named model feature to preload large model. Here we use the Llama-2-7B-Chat
model as an example:
# download model
curl -LO https://huggingface.co/second-state/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf
# run the `llama-chat` wasm app with the model
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat.Q5_K_M.gguf \
llama-chat.wasm --prompt-template llama-2-chat
After executing the command, you may need to wait a moment for the input prompt to appear.
You can enter your question once you see the [USER]:
prompt:
[USER]:
What's the capital of France?
[ASSISTANT]:
The capital of France is Paris.
[USER]:
what about Norway?
[ASSISTANT]:
The capital of Norway is Oslo.
[USER]:
I have two apples, each costing 5 dollars. What is the total cost of these apples?
[ASSISTANT]:
The total cost of the two apples is 10 dollars.
[USER]:
What if I have 3 apples?
[ASSISTANT]:
If you have 3 apples, each costing 5 dollars, the total cost of the apples is 15 dollars.
The options for llama-chat
wasm app are:
~/workspace/llama-utils/chat$ wasmedge llama-chat.wasm -h
Usage: llama-chat.wasm [OPTIONS]
Options:
-a, --model-alias <ALIAS>
Model alias [default: default]
-c, --ctx-size <CTX_SIZE>
Size of the prompt context [default: 4096]
-n, --n-predict <N_PRDICT>
Number of tokens to predict [default: 1024]
-g, --n-gpu-layers <N_GPU_LAYERS>
Number of layers to run on the GPU [default: 100]
-b, --batch-size <BATCH_SIZE>
Batch size for prompt processing [default: 4096]
-r, --reverse-prompt <REVERSE_PROMPT>
Halt generation at PROMPT, return control.
-s, --system-prompt <SYSTEM_PROMPT>
System prompt message string [default: "[Default system message for the prompt template]"]
-p, --prompt-template <TEMPLATE>
Prompt template. [default: llama-2-chat] [possible values: llama-2-chat, codellama-instruct, mistral-instruct-v0.1, mistrallite, openchat, belle-llama-2-chat, vicuna-chat, chatml, baichuan-2, wizard-coder, zephyr, intel-neural, deepseek-chat, deepseek-coder]
--log-prompts
Print prompt strings to stdout
--log-stat
Print statistics to stdout
--log-all
Print all log information to stdout
--stream-stdout
Print the output to stdout in the streaming way
-h, --help
Print help
Run the following command:
cargo build --target wasm32-wasi --release
The llama-chat.wasm
will be generated in the target/wasm32-wasi/release
folder.