vLLM Manager

A REST API backend for remotely managing a vLLM inference server. Built for my personal home server setup using Tailscale for secure remote access.

Use at your own risk.

Overview

This service provides remote control over a vLLM systemd service, allowing you to:

Start/stop/restart the vLLM service
Switch between different models
Monitor service status and GPU utilization
Remotely shutdown the server

Requirements

Python 3.11+
systemd
nvidia-smi (for GPU stats)
Tailscale (or other secure network access)

Security Model

This backend has no authentication. It relies entirely on Tailscale's network-level security. If you can reach the API, you're authorized to use it.

Do not expose this service to the public internet.

Installation

# Clone and enter directory
cd /path/to/vllm-server

# Create virtual environment
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt

# Copy and configure models
cp config.yaml.example config.yaml
# Edit config.yaml with your models

# Follow post-installation steps
cat POST_INSTALL.md

See POST_INSTALL.md for systemd and sudoers setup.

Configuration

Create config.yaml with your models:

models:
  qwen-32b:
    script: /path/to/start_qwen32b.sh

  llama-70b:
    script: /path/to/start_llama70b.sh

Each model references a shell script that starts vLLM with the appropriate parameters. See start_model.sh.example for a template.

API

The service runs on port 9090.

Endpoint	Method	Description
`/status`	GET	Current state, loaded model, GPU stats
`/models`	GET	List all configured models
`/start`	POST	Start vLLM with last-used model
`/stop`	POST	Stop vLLM service
`/restart`	POST	Restart vLLM service
`/switch`	POST	Switch to a different model
`/shutdown`	POST	Shutdown the server (10s delay)

Examples

# Check status
curl http://server:9090/status

# List models
curl http://server:9090/models

# Switch model
curl -X POST http://server:9090/switch \
  -H "Content-Type: application/json" \
  -d '{"model": "llama-70b"}'

# Shutdown server
curl -X POST http://server:9090/shutdown

Android App

A native Android companion app for monitoring and controlling the server from your phone. See android/readme.md for details.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
android		android
web		web
.gitignore		.gitignore
POST_INSTALL.md		POST_INSTALL.md
README.md		README.md
config.yaml.example		config.yaml.example
install.sh		install.sh
main.py		main.py
requirements.txt		requirements.txt
start_model.sh.example		start_model.sh.example
sudoers.vllm-manager.example		sudoers.vllm-manager.example
vllm-manager.service.example		vllm-manager.service.example
vllm.service.example		vllm.service.example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vLLM Manager

Overview

Requirements

Security Model

Installation

Configuration

API

Examples

Android App

License

About

Uh oh!

Releases

Packages

Languages

tobrun/vllm-server

Folders and files

Latest commit

History

Repository files navigation

vLLM Manager

Overview

Requirements

Security Model

Installation

Configuration

API

Examples

Android App

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages