Name	Name	Last commit message	Last commit date
Latest commit History 122 Commits
.github/workflows	.github/workflows
assets	assets
old_solo_server	old_solo_server
solo_server	solo_server
README.md	README.md
requirements.txt	requirements.txt
setup.py	setup.py

Solo Server

Solo Server is a lightweight platform that enables users to manage and monitor AI models on their hardware.

Features

Seamless Setup: Manage your on device AI with a simple CLI and HTTP servers
Open Model Registry: Pull models from registries like Ollama & Hugging Face
Lean Load Testing: Built-in commands to benchmark endpoints
Cross-Platform Compatibility: Deploy AI models effortlessly on your hardware
Configurable Framework: Auto-detect hardware (CPU, GPU, RAM) and sets configs

Features
Installation
Commands
Supported Models
Configuration
Project Inspiration

Installation

🔹Prerequisites

🐋 Docker: Required for containerization
- Install Docker

🔹 Install via PyPI

# Make sure you have Python <= 3.12
python --version  # Should be below 3.13

# Create a new virtual environment
python -m venv .venv

# Activate the virtual environment
source .venv/bin/activate  # On Unix/MacOS
# OR
.venv\Scripts\activate # On Windows

pip install solo-server

🔹 Install with `uv` (Recommended)

# Install uv
# On Windows (PowerShell)
iwr https://astral.sh/uv/install.ps1 -useb | iex

# On Unix/MacOS
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment
uv venv

# Activate the virtual environment
source .venv/bin/activate  # On Unix/MacOS
# OR
.venv\Scripts\activate     # On Windows

uv pip install solo-server

Creates an isolated environment using uv for performance and stability.

🔹 Install in Dev Mode

# Clone the repository
git clone https://github.com/GetSoloTech/solo-server.git

# Navigate to the directory
cd solo-server

# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate  # Unix/MacOS
# OR
.venv\Scripts\activate     # Windows

# Install in editable mode
pip install -e .

Run the interactive setup to configure Solo Server:

solo start

🔹 Setup Features

✔️ Detects CPU, GPU, RAM for hardware-optimized execution
✔️ Auto-configures solo.conf with optimal settings
✔️ Requests API keys for Ngrok and Replicate
✔️ Recommends the compute backend OCI (CUDA, HIP, SYCL, Vulkan, CPU, Metal)

Example Output:

🖥️  System Information
Operating System: Windows
CPU: AMD64 Family 23 Model 96 Stepping 1, AuthenticAMD
CPU Cores: 8
Memory: 15.42GB
GPU: NVIDIA
GPU Model: NVIDIA GeForce GTX 1660 Ti
GPU Memory: 6144.0GB
Compute Backend: CUDA

🚀 Setting up Solo Server...
✅ Solo server is ready!

Commands

1️⃣ Pull & Run a Model

solo run llama3.2

2️⃣ Serve a Model

solo serve llama3

Access the UI at:

http://127.0.0.1:5070  #SOLO_SERVER_PORT

Diagram

+-------------------+
|                   |
| solo run llama3.2 |
|                   |
+---------+---------+
          |
          |
          |           +------------------+           +----------------------+
          |           | Pull inferencing |           |   Pull model layer   |
          +-----------| runtime (cuda)   |---------->|       llama3.2       | 
                      +------------------+           +----------------------+
                                                     |     Repo options     |
                                                     ++-----------+--------++
                                                      |           |        |
                                                      v           v        v
                                                +----------+ +----------+ +-------------+
                                                | Ollama   | | vLLM     | | HuggingFace |
                                                | Registry | | registry | |  Registry   |
                                                +-----+------+---+------+-++------------+
                                                      |          |         |
                                                      v          v         v
                                                      +---------------------+
                                                      |   Start with        |
                                                      |   cuda runtime      |
                                                      |   and               |
                                                      |   llama3.2          |
                                                      +---------------------+

3️⃣ Benchmark a Model

solo benchmark llama3

Example Output:

Running benchmark for llama3...
🔹 Model Size: 7B
🔹 Compute Backend: CUDA
🔹 Prompt Processing Speed: 1450 tokens/s
🔹 Text Generation Speed: 135 tokens/s

Running classification accuracy test...
🔹 Batch 0 Accuracy: 0.7300
🔹 Batch 1 Accuracy: 0.7520
🔹 Batch 2 Accuracy: 0.7800
🔹 Overall Accuracy: 0.7620

Running additional benchmarks...
🔹 F1 Score: 0.8150
🔹 Confusion Matrix:
tensor([[10,  2,  1,  0,  0],
        [ 1, 12,  0,  0,  0],
        [ 0,  0, 11,  0,  1],
        [ 0,  0,  0, 13,  0],
        [ 0,  0,  0,  0, 15]])
Benchmarking complete!

4️⃣ Check Model Status

solo status

Example Output:

🔹 Running Models:
-------------------------------------------
| Name      | Model   | Backend | Port |
|----------|--------|---------|------|
| llama3   | Llama3 | CUDA    | 8080 |
| gptj     | GPT-J  | CPU     | 8081 |
-------------------------------------------

5️⃣ Stop a Model

solo stop

Example Output:

🛑 Stopping Solo Server...
✅ Solo server stopped successfully.

Supported Models

Solo Server supports multiple model sources, including Ollama & Hugging Face.

Model Name	Source
DeepSeek R1	`ollama://deepseek-r1`
IBM Granite 3.1	`ollama://granite3.1-dense`
Granite Code 8B	`hf://ibm-granite/granite-8b-code-base-4k-GGUF`
Granite Code 20B	`hf://ibm-granite/granite-20b-code-base-8k-GGUF`
Granite Code 34B	`hf://ibm-granite/granite-34b-code-base-8k-GGUF`
Mistral 7B	`hf://TheBloke/Mistral-7B-Instruct-v0.2-GGUF`
Mistral 7B v3	`hf://MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF`
Hermes 2 Pro	`hf://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF`
Cerebrum 1.0 7B	`hf://froggeric/Cerebrum-1.0-7b-GGUF`
Dragon Mistral 7B	`hf://llmware/dragon-mistral-7b-v0`

⚙️ Configuration (`solo.conf`)

After setup, all settings are stored in:

~/.solo/solo.conf

Example:

# Solo Server Configuration

MODEL_REGISTRY=ramalama
MODEL_PATH=/home/user/solo/models
COMPUTE_BACKEND=CUDA
SERVER_PORT=5070
LOG_LEVEL=INFO

# Hardware Detection
CPU_MODEL="Intel i9-13900K"
CPU_CORES=24
MEMORY_GB=64
GPU_VENDOR="NVIDIA"
GPU_MODEL="RTX 3090"

# API Keys
NGROK_API_KEY="your-ngrok-key"
REPLICATE_API_KEY="your-replicate-key"

✅ Modify this file anytime and run:

solo setup

📝 Project Inspiration

This project wouldn't be possible without the help of other projects like:

uv
llama.cpp
ramalama
ollama
whisper.cpp
vllm
podman
huggingface
llamafile
cog

Like using Solo, consider leaving us a ⭐ on GitHub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Solo Server

Features

Table of Contents

Installation

🔹Prerequisites

🔹 Install via PyPI

🔹 Install with `uv` (Recommended)

🔹 Install in Dev Mode

🔹 Setup Features

Commands

1️⃣ Pull & Run a Model

2️⃣ Serve a Model

Diagram

3️⃣ Benchmark a Model

4️⃣ Check Model Status

5️⃣ Stop a Model

Supported Models

⚙️ Configuration (`solo.conf`)

📝 Project Inspiration

About

Uh oh!

Releases

Uh oh!

Contributors 8

Uh oh!

Languages

License

GetSoloTech/solo-server

Folders and files

Latest commit

History

Repository files navigation

Solo Server

Features

Table of Contents

Installation

🔹Prerequisites

🔹 Install via PyPI

🔹 Install with uv (Recommended)

🔹 Install in Dev Mode

🔹 Setup Features

Commands

1️⃣ Pull & Run a Model

2️⃣ Serve a Model

Diagram

3️⃣ Benchmark a Model

4️⃣ Check Model Status

5️⃣ Stop a Model

Supported Models

⚙️ Configuration (solo.conf)

📝 Project Inspiration

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors 8

Uh oh!

Languages

🔹 Install with `uv` (Recommended)

⚙️ Configuration (`solo.conf`)