KubeSentiment: Production-Ready MLOps Sentiment Analysis Microservice

KubeSentiment is a production-grade, scalable, and observable sentiment analysis microservice. Built with FastAPI and designed for Kubernetes, it embodies modern MLOps best practices from the ground up, providing a robust foundation for deploying machine learning models in a cloud-native environment.

✨ Why KubeSentiment?

This project was built to serve as a comprehensive, real-world example of MLOps principles in action. It addresses common challenges in deploying ML models, such as:

Scalability: Handles high-throughput inference with Kubernetes Horizontal Pod Autoscaling.
Observability: Offers deep insights into model and system performance with a pre-configured monitoring stack.
Reproducibility: Ensures consistent environments from development to production with Docker, Terraform, and Helm.
Security: Integrates best practices for secret management, container security, and network policies.
Automation: Features a complete CI/CD pipeline for automated testing and deployment.

🚀 Key Features

High-Performance AI Inference: Leverages state-of-the-art transformer models for real-time sentiment analysis.
GPU Acceleration & Multi-GPU Support: Enterprise-grade GPU scheduling with dynamic batch optimization and load balancing for maximum throughput (800-3000 req/s per GPU).
ONNX Optimization: Supports ONNX Runtime for accelerated inference and reduced resource consumption.
Cloud-Native & Kubernetes-Ready: Designed for Kubernetes with auto-scaling, health checks, and zero-downtime deployments via Helm.
Full Observability Stack: Integrated with Prometheus, Grafana, and structured logging for comprehensive monitoring.
Infrastructure as Code (IaC): Reproducible infrastructure defined with Terraform.
Automated CI/CD Pipeline: GitHub Actions for automated testing, security scanning (Trivy), and deployment.
Secure by Design: Integrates with HashiCorp Vault for secrets management and includes hardened network policies.
Comprehensive Benchmarking: Includes a full suite for performance and cost analysis across different hardware configurations (CPU vs GPU).

🏛️ Architecture Overview

The system is designed as a modular, cloud-native application. At its core is the FastAPI service, which serves the sentiment analysis model. This service is containerized and deployed to a Kubernetes cluster, with a surrounding ecosystem for monitoring, security, and traffic management.

graph TD
    subgraph "Clients"
        A[Web/Mobile Apps]
        B[Batch Jobs]
        C[API Consumers]
    end

    subgraph "Infrastructure Layer (Kubernetes Cluster)"
        D[Ingress Controller] --> E{Sentiment Service};
        E --> F[Pod 1];
        E --> G[Pod 2];
        E --> H[...];

        subgraph "Sentiment Analysis Pod"
            direction LR
            I[FastAPI App] --> J["Sentiment Model - ONNX/PyTorch"];
        end

        F ----> I;

        subgraph "Observability Stack"
            K[Prometheus] -- Scrapes --> E;
            L[Grafana] -- Queries --> K;
            M[Alertmanager] -- Alerts from --> K;
            M --> N[Notifications];
        end

        subgraph "Security"
            O[HashiCorp Vault] <--> F;
        end
    end

    A --> D;
    B --> D;
    C --> D;

    style F fill:#lightgreen
    style G fill:#lightgreen
    style H fill:#lightgreen

For a deeper dive into the technical design, components, and patterns used, please see the Architecture Document.

🏁 Getting Started

Prerequisites

Ensure you have the following tools installed on your local machine:

Python 3.11+
Docker & Docker Compose
kubectl (for Kubernetes interaction)
Helm 3+ (for Kubernetes package management)
make (optional, for using Makefile shortcuts)

1. Local Development (Docker Compose)

This is the quickest way to get the service running on your local machine.

Clone the repository:

git clone https://github.com/arec1b0/KubeSentiment.git
cd KubeSentiment

Install dependencies:

# Core dependencies
pip install -r requirements.txt

# Development tools (linting, formatting, type checking)
pip install -r requirements-dev.txt

# Testing dependencies
pip install -r requirements-test.txt

# Cloud-specific dependencies (install as needed):
# AWS: pip install -r requirements-aws.txt
# GCP: pip install -r requirements-gcp.txt
# Azure: pip install -r requirements-azure.txt

# Or use the Makefile shortcut:
make install-dev

Tip: You can also use our setup script to automate virtual environment creation and dependency installation:
python scripts/setup/setup_dev_environment.py

Build the service: The project includes multiple Dockerfiles for different use cases (see Dockerfile Guide).
```
# Standard development build
make build-dev
```
Start the service:
```
docker-compose up --build
```
This will build the Docker image and start the FastAPI service.

Test the service: Open a new terminal and send a prediction request:

Note on Ports: The application runs on port 8000 inside the container. When using Docker Compose, this is mapped to localhost:8000. In Kubernetes, you can use kubectl port-forward to map to any local port (e.g., 8080:80 or 8000:8000).

curl -X POST "http://localhost:8000/predict" \
     -H "Content-Type: application/json" \
     -d '{"text": "This is an amazing project and the setup was so easy!"}'

You should receive a response like:

{
  "label": "POSITIVE",
  "score": 0.9998,
  "inference_time_ms": 45.2,
  "model_name": "distilbert-base-uncased-finetuned-sst-2-english",
  "text_length": 54,
  "backend": "pytorch",
  "cached": false
}

2. Full Kubernetes Deployment with Monitoring

To experience the full MLOps stack, including the monitoring dashboard, you can deploy the entire system to a Kubernetes cluster (e.g., Minikube, kind, or a cloud provider's cluster).

Our Quick Start Guide provides a one-line script to get the application and its full monitoring stack running in minutes.

# Follow the instructions in the Quick Start guide
./scripts/infra/setup-monitoring.sh

This will deploy the application along with Prometheus for metrics, Grafana for dashboards, and Alertmanager for alerts.

💻 Usage

API Endpoints

The service exposes several key endpoints:

Method	Endpoint	Description
`POST`	`/api/v1/predict`	Analyzes the sentiment of the input text.
`POST`	`/api/v1/batch/predict`	Submits a batch of texts for asynchronous prediction.
`GET`	`/api/v1/batch/status/{job_id}`	Checks the status of a batch prediction job.
`GET`	`/api/v1/batch/results/{job_id}`	Retrieves the results of a completed batch job.
`GET`	`/api/v1/health`	Health check endpoint for readiness/liveness.
`GET`	`/api/v1/metrics`	Exposes Prometheus metrics.
`GET`	`/api/v1/model-info`	Returns metadata about the loaded ML model.

Note on API Versioning: All endpoints use the /api/v1 prefix in production mode. When running in debug mode (MLOPS_DEBUG=true), the /api/v1 prefix is omitted for easier local development (e.g., /predict instead of /api/v1/predict).

Configuration

The application uses a profile-based configuration system (see ADR-009) with comprehensive documentation. For complete configuration setup, see docs/configuration/.

Quick Links:

Quick Start - Get running in 5 minutes
Environment Variables - Complete reference
Deployment Guide - Environment-specific configurations
Profiles - Profile-based defaults

Note: .env files are intentionally not baked into the container image. Provide any sensitive or environment-specific settings at runtime using the mechanisms supported by your orchestrator.

Quick Configuration

# Set profile (applies 50+ defaults automatically)
export MLOPS_PROFILE=development  # or local, staging, production

# Override specific settings as needed
export MLOPS_REDIS_HOST=my-redis-server

Supplying configuration at runtime

Docker / Docker Compose: pass variables with --env-file or individual -e flags when running the container.
Kubernetes: mount configuration with envFrom or env entries sourced from a ConfigMap or Secret. Sensitive values (API keys, credentials) should be stored in a Secret, while non-sensitive defaults can live in a ConfigMap.

Example Kubernetes manifest snippet:

envFrom:
  - configMapRef:
      name: mlops-sentiment-config
  - secretRef:
      name: mlops-sentiment-secrets

This approach keeps secrets out of the image and allows you to tailor configuration per environment (dev/staging/prod) without rebuilding the container.

📊 Benchmarking

A key part of this project is its ability to measure performance and cost-effectiveness. The benchmarking/ directory contains a powerful suite for running load tests against the service on different hardware configurations.

This allows you to answer questions like:

"Is a GPU instance more cost-effective than a CPU instance for my workload?"
"What is the maximum RPS our current configuration can handle?"

The benchmarking scripts generate a comprehensive HTML report with performance comparisons, cost analysis, and resource utilization charts. See the Benchmarking README for more details.

Latest Kafka Consumer Benchmark Snapshot

Scenario	Batch Size	Messages Processed	Wall Time (s)	Approx. Throughput (msg/s)
Synthetic workload executed with the in-repo `MockModel` using the high-throughput Kafka consumer (no external brokers)	1,000	1,000	0.0095	~105,700

How it was measured: python benchmarking helper script (see commit history) instantiates HighThroughputKafkaConsumer with the same MockModel used in the test suite and processes 1,000 synthetic messages in a single batch. This mirrors the high-throughput path without requiring a live Kafka cluster, making it reproducible in constrained environments.

Tip: For end-to-end benchmarks against a running Kafka cluster, use benchmarking/kafka_performance_test.py. It exercises producer/consumer I/O, DLQ handling, and Prometheus instrumentation at scale.

🤝 Contributing

We welcome contributions of all kinds! Whether it's reporting a bug, improving documentation, or submitting a new feature, your help is appreciated.

Please read our Contributing Guide to get started with the development setup, code quality standards, and pull request process.

📜 License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
.claude		.claude
.github		.github
app		app
benchmarking		benchmarking
chaos		chaos
config		config
docs		docs
helm/mlops-sentiment		helm/mlops-sentiment
infrastructure		infrastructure
k8s		k8s
notebooks		notebooks
openapi-specs		openapi-specs
scripts		scripts
sdk		sdk
serverless		serverless
tests		tests
.coveragerc		.coveragerc
.editorconfig		.editorconfig
.env.development.template		.env.development.template
.env.local.template		.env.local.template
.env.production.template		.env.production.template
.env.staging.template		.env.staging.template
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
DEPENDENCIES.md		DEPENDENCIES.md
Dockerfile		Dockerfile
Dockerfile.distroless		Dockerfile.distroless
Dockerfile.optimized		Dockerfile.optimized
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.kafka.yml		docker-compose.kafka.yml
docker-compose.observability.yml		docker-compose.observability.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements-aws.txt		requirements-aws.txt
requirements-azure.txt		requirements-azure.txt
requirements-dev.txt		requirements-dev.txt
requirements-gcp.txt		requirements-gcp.txt
requirements-onnx.txt		requirements-onnx.txt
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KubeSentiment: Production-Ready MLOps Sentiment Analysis Microservice

✨ Why KubeSentiment?

🚀 Key Features

🏛️ Architecture Overview

🏁 Getting Started

Prerequisites

1. Local Development (Docker Compose)

2. Full Kubernetes Deployment with Monitoring

💻 Usage

API Endpoints

Configuration

Quick Configuration

Supplying configuration at runtime

📊 Benchmarking

Latest Kafka Consumer Benchmark Snapshot

🤝 Contributing

📜 License

About

Uh oh!

Releases 2

Contributors 5

Uh oh!

Languages

License

arec1b0/KubeSentiment

Folders and files

Latest commit

History

Repository files navigation

KubeSentiment: Production-Ready MLOps Sentiment Analysis Microservice

✨ Why KubeSentiment?

🚀 Key Features

🏛️ Architecture Overview

🏁 Getting Started

Prerequisites

1. Local Development (Docker Compose)

2. Full Kubernetes Deployment with Monitoring

💻 Usage

API Endpoints

Configuration

Quick Configuration

Supplying configuration at runtime

📊 Benchmarking

Latest Kafka Consumer Benchmark Snapshot

🤝 Contributing

📜 License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Contributors 5

Uh oh!

Languages