Skip to content

Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.

License

Notifications You must be signed in to change notification settings

EliseBuehler2000/candle-vllm

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

candle-vllm

Continuous integration

Efficient, easy-to-use platform for inference and serving local LLMs including an OpenAI compatible API server.

Features

  • OpenAI compatible API server provided for serving LLMs.
  • Highly extensible trait-based system to allow rapid implementation of new module pipelines,
  • Streaming support in generation.

Pipelines

  • Llama
    • 7b
    • 13b
    • 70b
  • Mistral
    • 7b

Examples

See this folder for some examples.

Example with Llama 7b

In your terminal, install the openai Python package by running pip install openai.

Then, create a new Python file and write the following code:

import openai

openai.api_key = "EMPTY"

openai.base_url = "http://localhost:2000/v1/"

completion = openai.chat.completions.create(
    model="llama7b",
    messages=[
        {
            "role": "user",
            "content": "Explain how to best learn Rust.",
        },
    ],
    max_tokens = 64,
)
print(completion.choices[0].message.content)

Next, launch a candle-vllm instance by running HF_TOKEN=... cargo run --release -- --hf-token HF_TOKEN --port 2000 llama7b --repeat-last-n 64.

After the candle-vllm instance is running, run the Python script and enjoy efficient inference with an OpenAI compatible API server!

Contributing

The following features are planned to be implemented, but contributions are especially welcome:

  • Sampling methods:
  • Pipeline batching (#3)
  • PagedAttention (#3)
  • More pipelines (from candle-transformers)

Resources

About

Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 100.0%