Skip to content

A step-by-step to install LGTM Stack + Opentelemetry Collector

License

Notifications You must be signed in to change notification settings

daviaraujocc/lgtm-stack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation



🔍 LGTM Stack for Kubernetes

Introduction

The LGTM stack, by Grafana Labs, combines best-in-class open-source tools to provide comprehensive system visibility, consisting of:

With this stack, we have a complete observability solution that covers logs, metrics, and traces, with support for high availability and scalability, plus all data will be present in a single location (grafana), making it easier to analyze and correlate events, and by using object storage as a backend, the solution becomes much more economical compared to others that require dedicated databases or persistent disks.

Table of Contents

Click to expand

Architecture

LGTM Architecture

The architecture of the LGTM stack in a Kubernetes environment follows a well-defined flow of data collection, processing, and visualization:

  1. Applications send telemetry data to an agent, in this case, the OpenTelemetry Collector.

  2. OpenTelemetry Collector acts as a central hub, routing each type of data to its specific backend:

  • Loki: for log processing
  • Mimir: for metrics storage
  • Tempo: for trace analysis
  1. Data is stored in an Object Storage, with dedicated buckets for each tool.

  2. Grafana is the interface where all data is queried, allowing for unified dashboards and alerts.

Also this architecture includes three optional components:

  • Prometheus: collects custom metrics from apps and cluster and sends to Mimir
  • Kube-state-metrics: collects metrics (CPU/Memory) of services/apps through the API server and outputs to Prometheus
  • Promtail: agent that captures container logs and sends to Loki

Hardware Requirements

Local:

  • 2-4 CPUs
  • 8 GB RAM

Production setup:

  • Can vary a lot depending on the amount of data and traffic, it's recommended to start with a small setup and scale as needed, for small-mid environments the following is recommended (minimum):
    • 8 CPUs
    • 24 GB RAM
    • 100 GB disk space (SSD, don't count for storage backends)

🚀 Getting Started

✨ Prerequisites

Note: This guide uses the official lgtm-distributed Helm chart from Grafana Labs for deployment.

Installation

Option 1: Makefile

To simplify the installation process, you can use the Makefile commands:

# Clone repository
git clone git@github.com:daviaraujocc/lgtm-stack.git
cd lgtm-stack
make install-local # For local testing, for using GCP cloud storage use make install-gcp and set PROJECT_ID

This will install the LGTM stack with the default configuration for local with the dependencies (promtail, dashboards, prometheus, MiniO). If you want to customize the installation, you can edit the helm/values-lgtm.local.yaml file.

Option 2: Manual Installation

Setup

# Clone repository
git clone git@github.com:daviaraujocc/lgtm-stack.git
cd lgtm-stack

# Add repositories & create namespace
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
kubectl create ns monitoring

# Install prometheus operator for metrics collection and CRDs
helm install prometheus-operator --version 66.3.1 -n monitoring \
  prometheus-community/kube-prometheus-stack -f helm/values-prometheus.yaml

Choose Your Environment

Local (k3s, minikube)

For local testing scenarios. Uses local storage via MinIO.

helm install lgtm --version 2.1.0 -n monitoring \
  grafana/lgtm-distributed -f helm/values-lgtm.local.yaml

GCP Production Setup

For production environments, using GCP resources for storage and monitoring.

  1. Set up GCP resources:
# Set your project ID
export PROJECT_ID=your-project-id

# Create buckets with random suffix
export BUCKET_SUFFIX=$(openssl rand -hex 4 | tr -d "\n")
for bucket in logs traces metrics metrics-admin; do
  gsutil mb -p ${PROJECT_ID} -c standard -l us-east1 gs://lgtm-${bucket}-${BUCKET_SUFFIX}
done

# Update bucket names in config
sed -i -E "s/(bucket_name:\s*lgtm-[^[:space:]]+)/\1-${BUCKET_SUFFIX}/g" helm/values-lgtm.gcp.yaml

# Create and configure service account
gcloud iam service-accounts create lgtm-monitoring \
    --display-name "LGTM Monitoring" \
    --project ${PROJECT_ID}

# Set permissions
for bucket in logs traces metrics metrics-admin; do 
  gsutil iam ch serviceAccount:lgtm-monitoring@${PROJECT_ID}.iam.gserviceaccount.com:admin \
    gs://lgtm-${bucket}-${BUCKET_SUFFIX}
done

# Create service account key and secret
gcloud iam service-accounts keys create key.json \
    --iam-account lgtm-monitoring@${PROJECT_ID}.iam.gserviceaccount.com
kubectl create secret generic lgtm-sa --from-file=key.json -n monitoring
  1. Install LGTM stack:

You can change values in helm/values-lgtm.gcp.yaml to fit your environment if you want like ingress for grafana, etc.

helm install lgtm --version 2.1.0 -n monitoring \
  grafana/lgtm-distributed -f helm/values-lgtm.gcp.yaml

Install dependencies

# Install Promtail for collecting container logs
# Check if you are using Docker or CRI-O runtime
## Docker runtime
kubectl apply -f manifests/promtail.docker.yaml
## CRI-O runtime 
## kubectl apply -f manifests/promtail.cri.yaml

Testing

After installation you can check components by running:

# Check if all pods are running
kubectl get pods -n monitoring

# To check logs

# Loki
kubectl logs -l app.kubernetes.io/name=loki -n monitoring

# Tempo
kubectl logs -l app.kubernetes.io/name=tempo -n monitoring

# Mimir
kubectl logs -l app.kubernetes.io/name=mimir -n monitoring

Follow the steps below to test each component:

Access Grafana

# Access dashboard
kubectl port-forward svc/lgtm-grafana 3000:80 -n monitoring

# Get password credentials
kubectl get secret --namespace monitoring lgtm-grafana -o jsonpath="{.data.admin-password}" | base64 --decode
  • Default username: admin
  • Access URL: http://localhost:3000
  • Check default Grafana dashboards and Explore tab

Sending Data

After installation, verify each component is working correctly:

Loki (Logs)

Test log ingestion and querying:

# Forward Loki port
kubectl port-forward svc/lgtm-loki-distributor 3100:3100 -n monitoring

# Send test log with timestamp and labels
curl -XPOST http://localhost:3100/loki/api/v1/push -H "Content-Type: application/json" -d '{
  "streams": [{
    "stream": { "app": "test", "level": "info" },
    "values": [[ "'$(date +%s)000000000'", "Test log message" ]]
  }]
}'

To verify:

  1. Open Grafana (http://localhost:3000)
  2. Go to Explore > Select Loki datasource
  3. Query using labels: {app="test", level="info"}
  4. You should see your test message in the results

If you have installed promtail you can check the container logs also on Explore tab.

Tempo (Traces)

Since Tempo is compatible with the OpenTelemetry OTLP protocol, we will use the Jaeger Trace Generator, a tool that generates example traces and sends the data using OTLP.

# Forward Tempo port
kubectl port-forward svc/lgtm-tempo-distributor 4318:4318 -n monitoring

# Generate sample traces with service name 'test'
docker run --add-host=host.docker.internal:host-gateway --env=OTEL_EXPORTER_OTLP_ENDPOINT=http://host.docker.internal:4318 jaegertracing/jaeger-tracegen -service test -traces 10

To verify:

  1. Go to Explore > Select Tempo datasource
  2. Search by Service Name: 'test'
  3. You should see 10 traces with different spans

Mimir (Metrics)

Since we have a Prometheus instance running inside the cluster sending basic metrics (CPU/Memory) to Mimir, you can already check the metrics in Grafana:

  1. Access Grafana
  2. Go to Explore > Select Mimir datasource
  3. Try these example queries:
    • rate(container_cpu_usage_seconds_total[5m]) - CPU usage
    • container_memory_usage_bytes - Container memory usage

You can also push custom metrics to Mimir using Prometheus Pushgateway, to endpoint http://lgtm-mimir-nginx.monitoring:80/api/v1/push.

OpenTelemetry

OpenTelemetry is a set of APIs, libraries, agents, and instrumentation to provide observability for cloud-native software. It consists of three main components:

  • OpenTelemetry SDK: Libraries for instrumenting applications to collect telemetry data (traces, metrics, logs).
  • OpenTelemetry Collector: A vendor-agnostic agent that collects, processes, and exports telemetry data to backends.
  • OpenTelemetry Protocol (OTLP): A standard for telemetry data exchange between applications and backends.

In this setup, we will use the OpenTelemetry Collector to route telemetry data to the appropriate backends (Loki, Tempo, Mimir).

OpenTelemetry Collector

To install the OpenTelemetry Collector:

# Install OpenTelemetry Collector
kubectl apply -f manifests/otel-collector.yaml

Check if the collector is up and running:

kubectl get pods -l app=otel-collector
kubectl logs -l app=otel-collector

Flask App Integration

We'll use a pre-instrumented Flask application (source code at flask-app/) that generates traces, metrics, and logs using OpenTelemetry.

The application exposes an endpoint /random that returns random numbers and generates telemetry data. The default endpoint used for sending telemetry data will be http://otel-collector:4318.

  1. Deploy the sample application:
# Deploy sample app
kubectl apply -f manifests/app/flask-app.yaml
  1. Verify application deployment:
kubectl get pods -l app=flask-app 
kubectl get svc flask-app-service 
  1. Apply PodMonitor for metrics scraping:
kubectl apply -f manifests/app/podmonitor.yaml

Testing the integration

  1. Generate traffic to the application:
# Get the application URL
# Port-forward the application
kubectl port-forward svc/flask-app 8000:8000 -n monitoring

# Send requests to generate telemetry data
for i in {1..50}; do
  curl http://localhost:8000/random
  sleep 0.5
done
  1. Check the generated telemetry data in Grafana:

Traces (Tempo):

  1. Go to Explore > Select Tempo datasource

  2. Search for Service Name: flask-app

  3. You should see traces with GET /random operations

Metrics (Mimir):

  1. Go to Explore > Select Mimir datasource

  2. Try these queries:

# Total requests count
rate(request_count_total[5m])

Logs (Loki):

  1. Go to Explore > Select Loki datasource

  2. Query using labels:

{job="flask-app"}

You should see structured logs from the application.

Extra Configuration

Loki Labels Customization

In case you have new labels you want to add to logs in Loki through the OpenTelemetry Collector, you need to perform the following configuration:

  1. Edit the ConfigMap otel-collector-config
  2. Locate the processors.attributes/loki section
  3. Add your custom labels to the loki.attribute.labels list:
processors:
  attributes/loki:
    actions:
      - action: insert
        key: loki.format
        value: raw
      - action: insert
        key: loki.attribute.labels
        value: facility, level, source, host, app, namespace, pod, container, job, your_label

After modifying the ConfigMap, restart the collector pod to apply the changes:

kubectl rollout restart daemonset/otel-collector -n monitoring

Uninstall

# Using Makefile
make uninstall

# or manual

# Remove LGTM stack
helm uninstall lgtm -n monitoring

# Remove prometheus operator 
helm uninstall prometheus-operator -n monitoring

# Remove namespace
kubectl delete ns monitoring

# Remove promtail & otel-collector 
kubectl delete -f manifests/promtail.yaml
kubectl delete -f manifests/otel-collector.yaml

# For GCP setup, cleanup:
for bucket in logs traces metrics metrics-admin; do
  gsutil rm -r gs://lgtm-${bucket}-${BUCKET_SUFFIX}
done

gcloud iam service-accounts delete lgtm-monitoring@${PROJECT_ID}.iam.gserviceaccount.com

About

A step-by-step to install LGTM Stack + Opentelemetry Collector

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published