Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPUs.
Startups such as Styrk AI, research teams like Hazy Research at Stanford, and large companies like AMD use Lemonade to run LLMs.
Step 1: Download & Install | Step 2: Launch and Pull Models | Step 3: Start chatting! |
---|---|---|
![]() |
![]() |
![]() |
Install using a GUI (Windows only), pip, or from source. | Use the Model Manager to install models | A built-in chat interface is available! |
Tip
Want your app featured here? Let's do it! Shoot us a message on Discord, create an issue, or email.
To run and chat with Gemma 3:
lemonade-server run Gemma-3-4b-it-GGUF
To install models ahead of time, use the pull
command:
lemonade-server pull Gemma-3-4b-it-GGUF
To check all models available, use the list
command:
lemonade-server list
Note: If you installed from source, use the
lemonade-server-dev
command instead.
Tip: You can use
--llamacpp vulkan/rocm
to select a backend when running GGUF models.
Lemonade supports both GGUF and ONNX models as detailed in the Supported Configuration section. A list of all built-in models is available here.
You can also import custom GGUF and ONNX models from Hugging Face by using our Model Manager (requires server to be running).
Lemonade supports the following configurations, while also making it easy to switch between them at runtime. Find more information about it here.
Hardware | Engine: OGA | Engine: llamacpp | Engine: HF | Windows | Linux |
---|---|---|---|---|---|
🧠 CPU | All platforms | All platforms | All platforms | ✅ | ✅ |
🎮 GPU | — | Vulkan: All platforms ROCm: Selected AMD platforms* |
— | ✅ | ✅ |
🤖 NPU | AMD Ryzen™ AI 300 series | — | — | ✅ | — |
* See supported AMD ROCm platforms
Architecture | Platform Support | GPU Models |
---|---|---|
gfx1151 (STX Halo) | Windows, Ubuntu | Ryzen AI MAX+ Pro 395 |
gfx120X (RDNA4) | Windows only | Radeon AI PRO R9700, RX 9070 XT/GRE/9070, RX 9060 XT |
gfx110X (RDNA3) | Windows, Ubuntu | Radeon PRO W7900/W7800/W7700/V710, RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT |
You can use any OpenAI-compatible client library by configuring it to use http://localhost:8000/api/v1
as the base URL. A table containing official and popular OpenAI clients on different languages is shown below.
Feel free to pick and choose your preferred language.
Python | C++ | Java | C# | Node.js | Go | Ruby | Rust | PHP |
---|---|---|---|---|---|---|---|---|
openai-python | openai-cpp | openai-java | openai-dotnet | openai-node | go-openai | ruby-openai | async-openai | openai-php |
from openai import OpenAI
# Initialize the client to use Lemonade Server
client = OpenAI(
base_url="http://localhost:8000/api/v1",
api_key="lemonade" # required but unused
)
# Create a chat completion
completion = client.chat.completions.create(
model="Llama-3.2-1B-Instruct-Hybrid", # or any other available model
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
# Print the response
print(completion.choices[0].message.content)
For more detailed integration instructions, see the Integration Guide.
The Lemonade SDK also include the following components:
- 🐍 Lemonade API: High-level Python API to directly integrate Lemonade LLMs into Python applications.
- 🖥️ Lemonade CLI: The
lemonade
CLI lets you mix-and-match LLMs (ONNX, GGUF, SafeTensors) with prompting templates, accuracy testing, performance benchmarking, and memory profiling to characterize your models on your hardware.
To read our frequently asked questions, see our FAQ Guide
We are actively seeking collaborators from across the industry. If you would like to contribute to this project, please check out our contribution guide.
New contributors can find beginner-friendly issues tagged with "Good First Issue" to get started.
This project is sponsored by AMD. It is maintained by @danielholanda @jeremyfowers @ramkrishna @vgodsoe in equal measure. You can reach us by filing an issue, emailing lemonade@amd.com, or joining our Discord.
This project is:
- Built with Python with ❤️ for the open source community,
- Standing on the shoulders of great tools from:
- ggml/llama.cpp
- OnnxRuntime GenAI
- Hugging Face Hub
- OpenAI API
- and more...
- Accelerated by mentorship from the OCV Catalyst program.
- Licensed under the Apache 2.0 License.
- Portions of the project are licensed as described in NOTICE.md.