Skip to content

Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

License

Notifications You must be signed in to change notification settings

lemonade-sdk/lemonade

Repository files navigation

🍋 Lemonade: Local LLM Serving with GPU and NPU acceleration

Discord Lemonade tests Windows 11 Ubuntu 24.04 | 25.04 Made with Python PRs Welcome Latest Release GitHub downloads GitHub issues License: Apache Code style: black Star History Chart

Lemonade Banner

Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPUs.

Startups such as Styrk AI, research teams like Hazy Research at Stanford, and large companies like AMD use Lemonade to run LLMs.

Getting Started

Step 1: Download & Install Step 2: Launch and Pull Models Step 3: Start chatting!
Download & Install Launch and Pull Models Start chatting!
Install using a GUI (Windows only), pip, or from source. Use the Model Manager to install models A built-in chat interface is available!

Use it with your favorite OpenAI-compatible app!

Open WebUI  Continue  Gaia  AnythingLLM  AI Dev Gallery  LM-Eval  CodeGPT  AI Toolkit

Tip

Want your app featured here? Let's do it! Shoot us a message on Discord, create an issue, or email.

Using the CLI

To run and chat with Gemma 3:

lemonade-server run Gemma-3-4b-it-GGUF

To install models ahead of time, use the pull command:

lemonade-server pull Gemma-3-4b-it-GGUF

To check all models available, use the list command:

lemonade-server list

Note: If you installed from source, use the lemonade-server-dev command instead.

Tip: You can use --llamacpp vulkan/rocm to select a backend when running GGUF models.

Model Library

Lemonade supports both GGUF and ONNX models as detailed in the Supported Configuration section. A list of all built-in models is available here.

You can also import custom GGUF and ONNX models from Hugging Face by using our Model Manager (requires server to be running).

Model Manager

Supported Configurations

Lemonade supports the following configurations, while also making it easy to switch between them at runtime. Find more information about it here.

Hardware Engine: OGA Engine: llamacpp Engine: HF Windows Linux
🧠 CPU All platforms All platforms All platforms
🎮 GPU Vulkan: All platforms
ROCm: Selected AMD platforms*
🤖 NPU AMD Ryzen™ AI 300 series
* See supported AMD ROCm platforms
Architecture Platform Support GPU Models
gfx1151 (STX Halo) Windows, Ubuntu Ryzen AI MAX+ Pro 395
gfx120X (RDNA4) Windows only Radeon AI PRO R9700, RX 9070 XT/GRE/9070, RX 9060 XT
gfx110X (RDNA3) Windows, Ubuntu Radeon PRO W7900/W7800/W7700/V710, RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT

Integrate Lemonade Server with Your Application

You can use any OpenAI-compatible client library by configuring it to use http://localhost:8000/api/v1 as the base URL. A table containing official and popular OpenAI clients on different languages is shown below.

Feel free to pick and choose your preferred language.

Python C++ Java C# Node.js Go Ruby Rust PHP
openai-python openai-cpp openai-java openai-dotnet openai-node go-openai ruby-openai async-openai openai-php

Python Client Example

from openai import OpenAI

# Initialize the client to use Lemonade Server
client = OpenAI(
    base_url="http://localhost:8000/api/v1",
    api_key="lemonade"  # required but unused
)

# Create a chat completion
completion = client.chat.completions.create(
    model="Llama-3.2-1B-Instruct-Hybrid",  # or any other available model
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

# Print the response
print(completion.choices[0].message.content)

For more detailed integration instructions, see the Integration Guide.

Beyond an LLM Server

The Lemonade SDK also include the following components:

  • 🐍 Lemonade API: High-level Python API to directly integrate Lemonade LLMs into Python applications.
  • 🖥️ Lemonade CLI: The lemonade CLI lets you mix-and-match LLMs (ONNX, GGUF, SafeTensors) with prompting templates, accuracy testing, performance benchmarking, and memory profiling to characterize your models on your hardware.

FAQ

To read our frequently asked questions, see our FAQ Guide

Contributing

We are actively seeking collaborators from across the industry. If you would like to contribute to this project, please check out our contribution guide.

New contributors can find beginner-friendly issues tagged with "Good First Issue" to get started.

Good First Issue

Maintainers

This project is sponsored by AMD. It is maintained by @danielholanda @jeremyfowers @ramkrishna @vgodsoe in equal measure. You can reach us by filing an issue, emailing lemonade@amd.com, or joining our Discord.

License and Attribution

This project is: