Skip to content

winstxnhdw/llm-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llm-api

API build.yml deploy.yml formatter.yml

Open in Spaces Open a Pull Request

A fast CPU-based API for Qwen 2.5, hosted on Hugging Face Spaces. To achieve faster executions, we are using CTranslate2 as our inference engine.

Usage

Simply cURL the endpoint like in the following.

curl -N 'https://winstxnhdw-llm-api.hf.space/api/v1/chat/stream' \
     -H 'Content-Type: application/json' \
     -d \
     '{
         "messages": [
             {
                 "role": "user",
                 "content": "What is the capital of France?"
             }
         ]
      }'

Development

There are a few ways to run llm-api locally for development.

Local

If you spin up the server using uv, you may access the Swagger UI at localhost:49494/schema/swagger.

uv run llm-api

Docker

You can access the Swagger UI at localhost:7860/schema/swagger after spinning the server up with Docker.

docker build -f Dockerfile.build -t llm-api .
docker run --rm --init -e SERVER_PORT=7860 -p 7860:7860 llm-api

You can enable CUDA support by building the image with the following --build-arg flag.

docker build -f Dockerfile.build -t llm-api --build-arg USE_CUDA=1 .
docker run --rm --init --gpus all -e SERVER_PORT=7860 -p 7860:7860 llm-api

Packages

 
 
 

Contributors 4

  •  
  •  
  •  
  •