Skip to content

Run and benchmark Large Language Models (LLMs) locally with llama.cpp on GPU (Docker + WSL2). Includes helper scripts, quantisation benchmarks, and an OpenAI-compatible API server.

Notifications You must be signed in to change notification settings

shuvanon/local-llm-setup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🦙 Local LLM Setup with llama.cpp (CUDA, Docker, WSL2)

CUDA Docker Platform Model

Run, benchmark, and serve Large Language Models locally with llama.cpp on GPU.
This repo gives you one-command scripts, persistent model management, OpenAI-compatible serving, and a repeatable benchmarking pipeline with plots.


✨ What you get

  • Local inference with CUDA via Docker (ghcr.io/ggerganov/llama.cpp:full-cuda)
  • OpenAI-compatible server (/v1/chat/completions) for easy app integration
  • Self-contained model workflow — first run downloads the GGUF into models/, later runs reuse it
  • Benchmarks that matter — automated sweeps + CSV + Markdown summary + charts
  • Polished automation — Makefile + venv so anyone can reproduce results

🚀 Quickstart

# 1) Clone
git clone https://github.com/shuvanon/local-llm-setup.git
cd local-llm-setup

# 2) Set up Python env for analysis & plots (matplotlib, pandas)
make venv
source .venv/bin/activate

# 3) Try a single prompt (downloads model on first run)
./scripts/run_llm.sh "Write a intro about federated learning." 64

# 4) Start API server (OpenAI-compatible)
./scripts/serve_llm.sh
# in another terminal:
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d @examples/chat_request.json

# 5) Run a batch sweep + summarize (CSV + Markdown + charts)
make benchmark

About

Run and benchmark Large Language Models (LLMs) locally with llama.cpp on GPU (Docker + WSL2). Includes helper scripts, quantisation benchmarks, and an OpenAI-compatible API server.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published