Master's Thesis Project: Exploring cloud-native MLOps infrastructure patterns
From local development to production-ready platform architecture
Research Focus:
- 🔬 How do modern cloud-native technologies integrate into cohesive platforms?
- ☸️ What does comprehensive Kubernetes platform engineering look like in practice?
- 🤖 How do MLOps workflows drive platform requirements and design decisions?
- 🔐 How can we implement security, observability, and multi-tenancy from day one?
- 📚 How do we bridge the gap between local academic research and industry practices?
ML Workloads We're Building For:
- 🤖 Classical ML: Predictive models using scikit-learn
- 🧠 Generative AI: Fine-tuning BERT, Qwen, and other transformer models and llms
- 🔄 End-to-End Pipelines: From data ingestion to model serving
┌─────────────────────────────────────────────────────────────────────────┐
│ GitOps & Infrastructure │
│ GitHub Actions • Argo CD • Terraform • Terragrunt • Helm/Kustomize │
├─────────────────┬─────────────────┬─────────────────┬─────────────────┤
│ ML Platform │ Observability │ Security │ Storage │
│ │ │ │ │
│ • Argo Workflows│ • Prometheus │ • Keycloak │ • MinIO │
│ • KServe │ • Grafana │ • Vault │ • PostgreSQL │
│ • MLflow │ • Loki & Tempo │ • cert-manager │ • Persistent │
│ │ • OpenCost │ • Istio mTLS │ Volumes │
└─────────────────┴─────────────────┴─────────────────┴─────────────────┘
Kubernetes (kind → cloud-ready)
Design Principles We're Exploring:
- Research-Driven: Documenting architectural decisions and trade-offs
- Cloud-Native Patterns: Modern Kubernetes ecosystem integration
- GitOps-First: Declarative infrastructure and application management
- Learning-Oriented: Comprehensive documentation of the journey
🛠️ Platform Team
- .github – Shared org templates and workflows
- github-management – Org automation with Terraform
- docs – Comprehensive documentation website
- infra-modules – Reusable Terraform modules
- infra-live – Live infrastructure with Terragrunt
- gitops – Argo CD apps and configs
- gh-actions-local-runner – Local runner utility for GH Actions
🧠 AI Team
- ai-ml-demo – Classical ML pipeline examples
- ai-bert-demo – BERT fine-tuning workflows
- ai-qwen-demo – Qwen model experimentation
🌐 Application Team
- demo-app-frontend – React dashboard for model interactions
- demo-app-backend – FastAPI backend with model integration
- landing-page – Org landing page
- 📖 Full Documentation - Detailed setup guides and architectural decisions
- 🎯 Local Development - Deploy the platform on your machine
- 📋 Project Roadmap - See what's coming next
- Explore Implementation Details: Browse the repositories above to see real-world GitOps and IaC patterns
- Learn Integration Patterns: See how modern cloud-native tools work together
- Contribute Ideas: Join discussions about platform engineering approaches
# Clone and deploy locally (requires Docker & kind)
git clone https://github.com/opencloudhub/infra-live.git
cd infra-live/local && ./deploy.sh
# Access MLflow UI
kubectl port-forward -n mlflow svc/mlflow-server 5000:5000
Current Thesis Phase: Building and documenting a comprehensive MLOps platform proof-of-concept
Research Contributions:
- Integration patterns for cloud-native MLOps toolchains
- Platform engineering approaches for AI/ML teams
- Documentation of architectural decisions and trade-offs
Future Applications:
- Foundation for university research projects requiring custom ML models and application integration
- Template for organizations transitioning from local development to production systems
- Personal showcase and learning resource for the cloud-native community
- Potential baseline for SaaS offerings or consulting engagements