Skip to content

Conversation

@domdomegg
Copy link
Member

@domdomegg domdomegg commented Aug 8, 2025

Original PR: #227

  • Add Pulumi-based infrastructure as code for deploying MCP Registry to Kubernetes
  • Support for both local development (minikube) and Azure Kubernetes Service (AKS)
  • Complete deployment orchestration including:

How is this different to #190

  • Supports cluster setup and management. This enables:
    • Non-hosting maintainers managing many devops workflows (e.g. scaling up the cluster, or bumping K8s versions). Without this, we'd need to bug/page the organisation hosting the registry when we need these things changed.
    • Makes it easy to spin up things like staging/temporary clusters, as well as enables contributors to replicate the stack exactly on their own Azure accounts.
  • Sets up cloud-agnostic services. For example, rather than using the Azure-managed ingresses and CA, we install nginx-ingress and cert-manager. This enables:
    • Running the entire infra stack can also run locally (e.g. in minikube, k3s, orbstack, colima) - making it much easier for contributors to test changes to infra stuff.
    • Moving between cloud providers much more easily, e.g. we could shift from Azure to GCP/AWS/other with minimal hassle.
  • Everything stays written in Go, rather than Helm templates. This means we get things like type-checking etc. for free (which from my experience makes AI tools wayyy better at editing K8s stuff), and contributors don't need to learn a new language if they're already using Go.

Testing

I've got this running well:

  • locally in minikube
  • on cloud in Azure (my personal Azure account)
Claude written architecture and security review

Deployment Review & Assessment

Current Architecture Strengths

Pulumi IaC Approach

  • Well-structured infrastructure as code using Pulumi
  • Multi-provider support (AKS, local) with clean abstraction
  • Good separation of concerns in pkg/ directory

Security Fundamentals

  • Non-root container execution (appuser with UID 10001)
  • Secrets properly managed via Kubernetes secrets
  • TLS/SSL certificate management with cert-manager and Let's Encrypt

Critical Issues & High-Priority Improvements

1. Production Deployment Not Ready 🚨
The registry deployment uses nginx:alpine placeholder image instead of the actual MCP registry:

  • deploy/pkg/k8s/registry.go:67 - TODO comments indicate incomplete setup
  • Health probes are commented out
  • Port mapping doesn't match actual application (80 vs 8080)

Fix: Build and publish actual registry container image to GHCR, update deployment

2. Database Security Considerations 🔒

  • MongoDB deployed without authentication
  • No backup/disaster recovery strategy
  • Database credentials hardcoded

Note: MongoDB is not exposed externally (ClusterIP service), so this is not a critical security risk but should be addressed for production.

3. Monitoring & Observability Gaps 📊

  • No Prometheus/Grafana monitoring stack
  • No log aggregation (ELK/Loki)
  • No application metrics/health dashboards
  • No alerting configured

4. High Availability & Reliability ⚠️

  • Single MongoDB instance (no replication)
  • No persistent volume backup strategy
  • Fixed 10Gi storage without growth planning
  • Only 2 replicas for registry service
  • No pod disruption budgets
  • No horizontal pod autoscaling

Recommended Improvements

Immediate (High Priority)

  1. Complete Registry Deployment - Build proper container image pipeline, enable health checks
  2. Secure MongoDB - Add authentication credentials, implement backup strategy

Medium Priority
3. Add Monitoring Stack - Prometheus, Grafana deployment
4. Security Hardening (Nice to Have) - RBAC policies, Network Policies, Pod Security Standards
5. CI/CD Pipeline Enhancement - Container image building/publishing, automated deployment

Lower Priority
6. High Availability - MongoDB replica set, HPA for registry pods
7. Operational Excellence - Kubernetes dashboard, cost optimization

Configuration Issues

  • Production config has test credentials: deploy/Pulumi.prod.yaml:4-5
  • Missing environment-specific resource sizing
  • Hardcoded domain names (example.com)

The deployment setup shows good architectural foundations but needs significant work before production readiness. The most critical issue is the placeholder nginx container - priority should be completing the actual registry application deployment before addressing the other improvements. Security measures like RBAC and Network Policies are nice to have but not strictly necessary given that MongoDB is not exposed externally.

🤖 Generated with Claude Code

Metadata

Working towards #91

domdomegg and others added 13 commits August 7, 2025 11:34
Comprehensive analysis of MCP Registry deployment setup identifying:
- Critical issue with placeholder nginx container vs actual registry
- Security considerations and monitoring gaps
- High availability and reliability improvements needed
- Prioritized recommendations for production readiness

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@domdomegg domdomegg added the tech requirements work Product requirements are clear, but needs technical review before implementation label Aug 8, 2025
@domdomegg
Copy link
Member Author

Having a meeting today at 5pm UK to decide how we proceed with this or #242. See Discord for details.

@domdomegg domdomegg removed the tech requirements work Product requirements are clear, but needs technical review before implementation label Aug 9, 2025
@domdomegg domdomegg requested a review from maxisbey August 11, 2025 15:11
@domdomegg domdomegg merged commit 667e284 into main Aug 11, 2025
6 checks passed
@domdomegg domdomegg deleted the adam/pulumi-infra branch August 11, 2025 18:43
@domdomegg domdomegg mentioned this pull request Aug 19, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants