Moondream Server

A lightweight wrapper around the Moondream2 transformers implementation, providing clean REST API endpoints for vision-language model capabilities.

This replaces the complex Moondream Station binary with a simple, maintainable Python service that exposes the core Moondream capabilities through a REST API.

Project Structure

moondream-server/
├── app/                   # FastAPI application
│   ├── __init__.py        # Package initialization
│   ├── app.py             # Main FastAPI application
│   ├── requirements.txt   # Python dependencies
│   └── test_api.py        # API test suite
├── charts/                # Helm chart for Kubernetes deployment
│   ├── templates/         # Kubernetes manifests
│   ├── Chart.yaml         # Helm chart metadata
│   └── values.yaml        # Default configuration values
├── Dockerfile             # Container image definition
├── Makefile               # Build and deployment automation
└── README.md              # This file

Quick Start

Manual Commands

Run with Docker

docker build -t moondream-api .
docker run -p 8080:8080 -v moondream-api-models:/root/.cache/huggingface moondream-api

Run Locally

pip install -r app/requirements.txt
python -m app.app

The service will be available at http://localhost:8080

Important Notes

Docker on macOS

When running in Docker on macOS, the service will automatically use CPU mode since Docker doesn't support MPS (Metal Performance Shaders) passthrough yet. This is normal and expected behavior.

Local development: Uses MPS for GPU acceleration on Apple Silicon
Docker: Falls back to CPU mode automatically
Linux with NVIDIA GPU: Uses CUDA when available

Model Pre-loading for Kubernetes

To eliminate lengthy startup times when pods are rescheduled, the Helm chart includes a model pre-loading feature:

How it Works

Kubernetes Job downloads the model to a shared PersistentVolumeClaim during installation
Application pods mount the same PVC, accessing pre-downloaded models instantly
No startup delay when pods move between nodes or restart

Configuration

The model pre-loading is configured under the unified persistence section in values.yaml:

persistence:
  enabled: true
  storageClass: "" # Storage class for PVC
  accessMode: ReadWriteMany # RWM for rolling deployments and model sharing
  size: 20Gi
  existingClaim: "" # Optional: use existing PVC instead of creating new one

  modelCache:
    enabled: true # Download models to the PVC
    modelRevision: "2025-06-21" # Model version to download
    waitForCache: true # Wait for model download job to complete before starting pods

API Endpoints

The service provides the following endpoints:

GET /health - Health check
GET /v1 - API info
POST /v1/caption - Generate image captions
POST /v1/query - Answer questions about images
POST /v1/detect - Detect objects in images
POST /v1/point - Locate objects in images

Example Usage

# Health check
curl http://localhost:8080/health

# Generate caption
curl -X POST "http://localhost:8080/v1/caption" \
  -F "image=@your_image.jpg" \
  -F "length=short"

# Ask a question
curl -X POST "http://localhost:8080/v1/query" \
  -F "image=@your_image.jpg" \
  -F "question=What do you see in this image?"

# Detect objects
curl -X POST "http://localhost:8080/v1/detect" \
  -F "image=@your_image.jpg" \
  -F "object_name=person"

# Point to objects
curl -X POST "http://localhost:8080/v1/point" \
  -F "image=@your_image.jpg" \
  -F "object_name=car"

Testing

Run the comprehensive test suite:

# Test all endpoints with a real image
python app/test_api.py

The test suite downloads a real image from Unsplash and tests all API endpoints to ensure proper functionality.

Deploy to Kubernetes

helm install moondream-server oci://ghcr.io/mad-deecent/charts/moondream-server \
  --namespace moondream --create-namespace

Or install from source:

git clone https://github.com/Mad-Deecent/moondream-server.git
cd moondream-server
helm install moondream-server ./charts --namespace moondream --create-namespace

Configuration

GPU Support

To deploy on GPU nodes, configure node selection in your values:

nodeSelector:
  nvidia.com/gpu.product: "NVIDIA-GeForce-RTX-3060"

Resource Requirements

Default configuration requests 1 GPU and 2GB memory. Adjust based on your needs:

resources:
  requests:
    cpu: 1000m
    memory: 2Gi
    nvidia.com/gpu: 1
  limits:
    cpu: 4000m
    memory: 8Gi
    nvidia.com/gpu: 1

Custom namespace

helm install moondream-server ./charts \
  --namespace <my-namespace> \
  --create-namespace

Support

For issues related to the Helm chart or containerization, open an issue in this repository.

For Moondream Station itself, visit moondream.ai or their official documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github/workflows		.github/workflows
app		app
charts		charts
.dockerignore		.dockerignore
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Moondream Server

Project Structure

Quick Start

Manual Commands

Run with Docker

Run Locally

Important Notes

Docker on macOS

Model Pre-loading for Kubernetes

How it Works

Configuration

API Endpoints

Example Usage

Testing

Deploy to Kubernetes

Configuration

GPU Support

Resource Requirements

Custom namespace

Support

About

Uh oh!

Releases 13

Packages

Uh oh!

Languages

Mad-Deecent/moondream-server

Folders and files

Latest commit

History

Repository files navigation

Moondream Server

Project Structure

Quick Start

Manual Commands

Run with Docker

Run Locally

Important Notes

Docker on macOS

Model Pre-loading for Kubernetes

How it Works

Configuration

API Endpoints

Example Usage

Testing

Deploy to Kubernetes

Configuration

GPU Support

Resource Requirements

Custom namespace

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 13

Packages 0

Uh oh!

Languages

Packages