🦙 Ollama Cloud Engine

Deploy a secure, scalable Ollama server on AWS or GCP in minutes. Features Tailscale-only access, automatic cost tracking, and enterprise-grade security.

$Infracost$

✨ Features

🔒 Zero-Trust Security: Tailscale mesh VPN with no SSH or public IPs
⚡ One-Command Deployment: Single command
💰 Cost Transparency: Automatic infrastructure cost estimation with Infracost
🎯 AI-Optimized: Pre-configured GPU instances for optimal LLM performance
📊 Enterprise Monitoring: AWS CloudWatch integration; GCP logging optional
🔧 Developer-Friendly: Choice of Docker or native CLI workflows

🏗️ Architecture

graph TB
    subgraph "Developer Environment"
        DEV["👨‍💻 Developer Machine"]
        CLINE["🔧 Cline/Cursor IDE"]
        TOOLS["⚡ Local Tools"]
    end

    subgraph "Tailscale Mesh Network"
        TS["🔒 Tailscale VPN<br/>Zero-Trust Auth<br/>WireGuard Encrypted"]
    end

    subgraph "AWS Cloud (us-east-1)"
        subgraph "VPC (10.42.0.0/16)"
            subgraph "Public Subnet (10.42.0.0/24)"
                NAT["🌐 NAT Gateway"]
                IGW["📡 Internet Gateway"]
            end
            
            subgraph "Private Subnet (10.42.1.0/24)"
                EC2["🦙 Ollama Server<br/>GPU Optimized<br/>No Public IP"]
                SG["🛡️ Security Group<br/>Zero Inbound Rules"]
            end
        end
        
        subgraph "Monitoring"
            CW["📊 CloudWatch<br/>Logs & Metrics"]
        end
        
        subgraph "Storage"
            EBS["💾 Encrypted EBS<br/>Model Storage"]
        end
    end

    DEV --> TS
    CLINE --> TS
    TOOLS --> TS
    TS -.->|"Encrypted Tunnel"| EC2
    EC2 --> NAT
    NAT --> IGW
    EC2 --> CW
    EC2 --> EBS
    SG --> EC2

What gets deployed:

VPC: Dedicated network (10.42.0.0/16) with public/private subnets
Security: Zero inbound rules, Tailscale-only access
Compute: GPU-optimized EC2 with automatic model selection
Storage: Encrypted EBS volumes sized per model requirements
Monitoring: CloudWatch logs and metrics collection
Networking: NAT Gateway for outbound connectivity (model downloads)

🚀 Quick Start

Prerequisites

Required for all setups:

Task (for task execution)
Tailscale account and auth key
Cloud authentication (see Authentication Setup below)

Choose one of the following:

Option A: Docker

Docker or Docker-compatible container runtime (Podman, Colima, etc.)
Ensure the docker command is available in your PATH

Option B: Local CLI Tools

GPU quota requirements

AWS: For G-family GPUs you need vCPU quota in the EC2 quota "Running On-Demand G and VT instances" for your target region. Minimum: 4–8 vCPUs depending on instance.
GCP: Enable Compute Engine API and request GPU quota in your chosen region/zone (e.g., T4/A100 availability varies by zone).

Authentication Setup

AWS Setup:

# Configure AWS credentials (if not already done)
aws configure --profile default
# OR use existing named profile
export AWS_PROFILE=your-profile-name

GCP Setup:

# Set up Application Default Credentials
gcloud auth application-default login

# Set your default project
gcloud config set project your-project-id

Getting a Tailscale Auth Key

Create an auth key in your Tailscale Admin Console
Click "Generate auth key" and configure:
- Description: "Ollama Cloud Engine" (or your preference)
- ✅ Reusable: Enable for multiple deployments
- ✅ Ephemeral: Node auto-removes when disconnected
- Expiry: Set to match your project timeline
- Tags: Optional, for access control policies
Copy the key - it starts with tskey-

📖 Documentation: Tailscale Auth Keys Guide
⏰ Key Expiry: See Key expiry details

Option A: Docker Workflow

Set up authentication

# For AWS
aws configure --profile default

# For GCP  
gcloud auth application-default login
gcloud config set project your-project-id

Create configuration file

# Copy the template and customize
cp vars.env.template vars.env
# Edit vars.env with your values (see Configuration section below)

Deploy infrastructure

task docker:create   # Reads CLOUD from vars.env, auto-mounts credentials

Manage your deployment

# Check status
task docker:status

# Start/stop (cost management)
task docker:start
task docker:stop

# Destroy when done
task docker:destroy

Option B: Local CLI Workflow

Install dependencies (macOS)
```
task cli:setup:mac
```

Set up authentication

# For AWS
aws configure --profile default

# For GCP
gcloud auth application-default login
gcloud config set project your-project-id

Create configuration file

# Copy the template and customize
cp vars.env.template vars.env
# Edit vars.env with your values (see Configuration section below)

Deploy and manage

# Deploy infrastructure
task cli:create   # Uses CLOUD from vars.env

# Manage deployment
task cli:status
task cli:start
task cli:stop
task cli:destroy

⚙️ Configuration

Environment Variables

Create a vars.env file in the project root (Task auto-loads this file automatically).

Template Example:

# Copy and customize the template
cp vars.env.template vars.env
# Edit vars.env with your specific values

Authentication Setup:

AWS: Configure ~/.aws/credentials with named profiles (default: default)
GCP: Run gcloud auth application-default login once to set up ADC (Application Default Credentials)

For Docker tasks, credential directories are automatically mounted into containers.

Complete Configuration Example:

# vars.env - Customize for your deployment
CLOUD=aws                                           # aws | gcp
TF_VAR_tailscale_auth_key=tskey-auth-xxx...        # Get from Tailscale admin console
TF_VAR_model_choice=codellama:7b-code              # See supported models below
TF_VAR_instance_name=Ollama-LLM-Server             # Instance name and Tailscale hostname
TF_VAR_enable_debug=false                          # Enable debug logging

# AWS Configuration (when CLOUD=aws)
TF_VAR_aws_profile=default                         # AWS profile name  
TF_VAR_aws_region=us-east-1                        # AWS region
# TF_VAR_custom_ami_id=ami-xxx                     # Optional: override AMI

# GCP Configuration (when CLOUD=gcp)
TF_VAR_gcp_project=your-project-id                 # GCP project ID
TF_VAR_gcp_region=us-central1                      # GCP region
TF_VAR_gcp_zone=us-central1-a                      # GCP zone

Required Variables:

CLOUD=aws                                    # aws or gcp
TF_VAR_tailscale_auth_key=tskey-auth-xxx...  # Your Tailscale key
TF_VAR_model_choice=codellama:7b-code        # Model to deploy

Optional Variables (with defaults):

TF_VAR_instance_name=Ollama-LLM-Server       # Instance name
TF_VAR_enable_debug=false                    # Debug logging
TF_VAR_aws_profile=default                   # AWS profile (AWS only)
TF_VAR_aws_region=us-east-1                  # AWS region (AWS only)
TF_VAR_gcp_region=us-central1                # GCP region (GCP only)
TF_VAR_gcp_zone=us-central1-a                # GCP zone (GCP only)
# TF_VAR_gcp_project=                        # Uses gcloud default (GCP only)
# TF_VAR_custom_ami_id=                      # Override AMI (AWS only)

Supported Models

The following models are supported with automatic GPU instance selection:

Model	AWS Instance	GCP Machine/GPU	Storage	Use Case
`codellama:7b-code`	g5.xlarge	n1-standard-8 + T4	100GB	Code completion, small projects
`codellama:13b-code`	g5.2xlarge	n1-standard-16 + T4	150GB	Advanced code generation
`codellama:34b-code`	g6e.xlarge	A2 (A100 1g)	200GB	Complex code analysis
`qwen2.5-coder:32b`	g6e.xlarge	A2 (A100 1g)	200GB	Multilingual code generation
`mistralai/Mistral-7B-Instruct-v0.1`	g5.xlarge	n1-standard-8 + T4	100GB	General instruction following
`deepseek-coder:6.7b-base`	g5.xlarge	n1-standard-8 + T4	100GB	Code understanding
`llama3:8b-instruct-q5_1`	g5.xlarge	n1-standard-8 (CPU OK)	100GB	General purpose, quantized

🔧 Usage with AI Coding Tools

Cline

After deployment, configure Cline to use your Ollama server:

Get your Tailscale URL (from deployment output):
```
http://Ollama-LLM-Server:11434
```
Configure Cline:
- Provider: ollama
- Model: Your TF_VAR_model_choice value
- Base URL: http://Ollama-LLM-Server:11434

Other Tools

The Ollama API is compatible with:

Continue.dev: VS Code/JetBrains plugin
Open WebUI: Web-based interface
LangChain: Python/JS framework
Custom applications: Standard OpenAI-compatible API

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
infra		infra
tasks		tasks
.gitignore		.gitignore
COMPREHENSIVE_REPOSITORY_ANALYSIS.md		COMPREHENSIVE_REPOSITORY_ANALYSIS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
Taskfile.yml		Taskfile.yml
vars.env.template		vars.env.template

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🦙 Ollama Cloud Engine

✨ Features

🏗️ Architecture

🚀 Quick Start

Prerequisites

GPU quota requirements

Authentication Setup

Getting a Tailscale Auth Key

Option A: Docker Workflow

Option B: Local CLI Workflow

⚙️ Configuration

Environment Variables

Complete Configuration Example:

Required Variables:

Optional Variables (with defaults):

Supported Models

🔧 Usage with AI Coding Tools

Cline

Other Tools

About

Uh oh!

Releases

Packages

Languages

License

garunski/ollama-cloud-engine

Folders and files

Latest commit

History

Repository files navigation

🦙 Ollama Cloud Engine

✨ Features

🏗️ Architecture

🚀 Quick Start

Prerequisites

GPU quota requirements

Authentication Setup

Getting a Tailscale Auth Key

Option A: Docker Workflow

Option B: Local CLI Workflow

⚙️ Configuration

Environment Variables

Complete Configuration Example:

Required Variables:

Optional Variables (with defaults):

Supported Models

🔧 Usage with AI Coding Tools

Cline

Other Tools

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages