MLX-LLM.cpp is a C/C++ library for LLM inference, based on mlx-llm. It leverages MLX to run on Apple Silicon.
Family | Models |
---|---|
LLaMA 2 | llama_2_7b_chat_hf |
LLaMA 3 | llama_3_8b |
TinyLLaMA | tiny_llama_1.1B_chat_v1.0 |
First, install MLX on your system:
git clone https://github.com/ml-explore/mlx.git mlx && cd mlx
mkdir -p build && cd build
cmake .. && make -j
make install
Clone the repository and its submodules:
git clone https://github.com/grorge123/mlx-llm.cpp.git
cd mlx-llm.cpp
git submodule update --init --recursive
Build the example:
mkdir build && cd build
cmake ..
cmake --build .
Refer to example/main.cpp
for a simple demonstration using TinyLLaMA 1.1B.
mkdir tiny && cd tiny
wget https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0/resolve/main/model.safetensors
cd ..
wget https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0/resolve/main/tokenizer.json
From the build
directory:
./main
This will generate results using the TinyLLaMA 1.1B model.