Add Docker image documentation and workflow #225

domdomegg · 2025-08-06T21:09:10Z

Adds a CI action to build docker images and publish them to GHCR on every commit. We can then use these when deploying the registry.

This turned out to be an upstream blocker of building the infra for deployment, and something like this is needed for either deployment approach we're exploring.

Summary

Add comprehensive documentation for pre-built Docker images in README
Include usage examples and configuration guidance
Add GitHub Actions workflow for automated Docker image publishing

🤖 Generated with Claude Code

Add documentation for pre-built Docker images including usage examples and configuration guidance. Include GitHub Actions workflow for automated Docker image publishing. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Adds a CI action to build docker images and publish them to GHCR on every commit. We can then use these when deploying the registry. This turned out to be an upstream blocker of building the infra for deployment, and something like this is needed for either deployment approach we're exploring. --- ## Summary - Add comprehensive documentation for pre-built Docker images in README - Include usage examples and configuration guidance - Add GitHub Actions workflow for automated Docker image publishing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com>

) Original PR: #227 - Add Pulumi-based infrastructure as code for deploying MCP Registry to Kubernetes - Support for both local development (minikube) and Azure Kubernetes Service (AKS) - Complete deployment orchestration including: - cluster setup: e.g. you point this at an Azure account, and it can set up and manage the cluster for you. e.g. K8s version, number of nodes, type of nodes, ... - cloud agnostic K8s services: cert-manager, nginx-ingress - app services: MongoDB, and registry application (currently using nginx as a placeholder, blocked on #225 (as is #190). but should be a 1 line change) ## How is this different to #190 - Supports cluster setup and management. This enables: - Non-hosting maintainers managing many devops workflows (e.g. scaling up the cluster, or bumping K8s versions). Without this, we'd need to bug/page the organisation hosting the registry when we need these things changed. - Makes it easy to spin up things like staging/temporary clusters, as well as enables contributors to replicate the stack exactly on their own Azure accounts. - Sets up cloud-agnostic services. For example, rather than using the Azure-managed ingresses and CA, we install nginx-ingress and cert-manager. This enables: - Running the entire infra stack can also run locally (e.g. in minikube, k3s, orbstack, colima) - making it much easier for contributors to test changes to infra stuff. - Moving between cloud providers much more easily, e.g. we could shift from Azure to GCP/AWS/other with minimal hassle. - Everything stays written in Go, rather than Helm templates. This means we get things like type-checking etc. for free (which from my experience makes AI tools wayyy better at editing K8s stuff), and contributors don't need to learn a new language if they're already using Go. ## Testing I've got this running well: - locally in minikube - on cloud in Azure (my personal Azure account) <details><summary>Claude written architecture and security review</summary> <p> ## Deployment Review & Assessment ### Current Architecture Strengths **Pulumi IaC Approach** - Well-structured infrastructure as code using Pulumi - Multi-provider support (AKS, local) with clean abstraction - Good separation of concerns in `pkg/` directory **Security Fundamentals** - Non-root container execution (`appuser` with UID 10001) - Secrets properly managed via Kubernetes secrets - TLS/SSL certificate management with cert-manager and Let's Encrypt ### Critical Issues & High-Priority Improvements **1. Production Deployment Not Ready** 🚨 The registry deployment uses `nginx:alpine` placeholder image instead of the actual MCP registry: - `deploy/pkg/k8s/registry.go:67` - TODO comments indicate incomplete setup - Health probes are commented out - Port mapping doesn't match actual application (80 vs 8080) **Fix:** Build and publish actual registry container image to GHCR, update deployment **2. Database Security Considerations** 🔒 - MongoDB deployed without authentication - No backup/disaster recovery strategy - Database credentials hardcoded *Note: MongoDB is not exposed externally (ClusterIP service), so this is not a critical security risk but should be addressed for production.* **3. Monitoring & Observability Gaps** 📊 - No Prometheus/Grafana monitoring stack - No log aggregation (ELK/Loki) - No application metrics/health dashboards - No alerting configured **4. High Availability & Reliability** ⚠️ - Single MongoDB instance (no replication) - No persistent volume backup strategy - Fixed 10Gi storage without growth planning - Only 2 replicas for registry service - No pod disruption budgets - No horizontal pod autoscaling ### Recommended Improvements **Immediate (High Priority)** 1. Complete Registry Deployment - Build proper container image pipeline, enable health checks 2. Secure MongoDB - Add authentication credentials, implement backup strategy **Medium Priority** 3. Add Monitoring Stack - Prometheus, Grafana deployment 4. Security Hardening (Nice to Have) - RBAC policies, Network Policies, Pod Security Standards 5. CI/CD Pipeline Enhancement - Container image building/publishing, automated deployment **Lower Priority** 6. High Availability - MongoDB replica set, HPA for registry pods 7. Operational Excellence - Kubernetes dashboard, cost optimization ### Configuration Issues - Production config has test credentials: `deploy/Pulumi.prod.yaml:4-5` - Missing environment-specific resource sizing - Hardcoded domain names (`example.com`) The deployment setup shows good architectural foundations but needs significant work before production readiness. The most critical issue is the placeholder nginx container - priority should be completing the actual registry application deployment before addressing the other improvements. Security measures like RBAC and Network Policies are nice to have but not strictly necessary given that MongoDB is not exposed externally. 🤖 Generated with [Claude Code](https://claude.ai/code) </p> </details> ## Metadata Working towards #91 --------- Co-authored-by: Claude <noreply@anthropic.com>

Adds the Pulumi code to: - Deploy the registry (and associated services e.g. mongodb) to Google Cloud Platform (GCP), on top of Google Kubernetes Engine (GKE) - Sets up proper environments and secrets management - Uses the real container image, now that it's published in #225. At the moment attached to latest, we might want to pin the version later (or perhaps always use `latest` in staging, and pin prod) - Uses real domains (`staging.registry.modelcontextprotocol.io`) rather than examples (``) ## Motivation and Context Setting up infrastructure to deploy it. I set something up in Azure in #227, although not super robust (e.g. no service accounts etc.). Think we will use GCP as: - the maintainers have experience with GCP, but none with Azure - costs are quite low, and Anthropic is happy to cover them in the short term - means we only have to maintain one login system (just Google Cloud Identity), not two (Google Workspace + Azure) ## How Has This Been Tested? Deployed this to a staging and production cluster. Try it yourself at: ```bash curl -H "Host: staging.registry.modelcontextprotocol.io" -k https://35.222.36.75/v0/ping ``` (will be sorting out domains very soon) ## Breaking Changes NA - just adds support for GCP deployment ## Types of changes  - [ ] Bug fix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update ## Checklist  - [x] I have read the [MCP Documentation](https://modelcontextprotocol.io) - [x] My code follows the repository's style guidelines - [ ] New and existing tests pass locally - [x] I have added appropriate error handling - [x] I have added or updated documentation as needed ## Additional context  Expected follow-ups: - GitHub Action setup to deploy things to the cluster from GitHub, to avoid gatekeeping to just the people with the secrets.

domdomegg requested a review from tadasant August 6, 2025 21:15

domdomegg mentioned this pull request Aug 7, 2025

feat(infra): Add Pulumi-based Kubernetes deployment infrastructure #227

Closed

tadasant approved these changes Aug 7, 2025

View reviewed changes

domdomegg merged commit f04c995 into main Aug 7, 2025
5 checks passed

domdomegg deleted the adamj/add-docker-images-documentation branch August 7, 2025 15:25

domdomegg mentioned this pull request Aug 11, 2025

feat(infra): Add Pulumi-based Kubernetes deployment infrastructure #237

Merged

domdomegg mentioned this pull request Aug 12, 2025

deploy: Deploy the registry to GCP #255

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Docker image documentation and workflow #225

Add Docker image documentation and workflow #225

Uh oh!

domdomegg commented Aug 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add Docker image documentation and workflow #225

Add Docker image documentation and workflow #225

Uh oh!

Conversation

domdomegg commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

domdomegg commented Aug 6, 2025 •

edited

Loading