Skip to content

LCY2013/ffapm

Repository files navigation

FFAPM - A Comprehensive APM System for Go

中文文档 | English

Table of Contents

Overview

FFAPM is a comprehensive Application Performance Monitoring (APM) system built with Go, designed to provide full-stack observability for microservices. It integrates modern observability tools including OpenTelemetry, Prometheus, Grafana, Jaeger, and the ELK stack to deliver distributed tracing, metrics collection, log aggregation, and alerting capabilities.

The project demonstrates best practices for building observable microservices in Go, featuring automatic instrumentation for HTTP and gRPC services, database queries, and Redis operations.

Features

Core APM Capabilities

  • Distributed Tracing: Full distributed tracing support using OpenTelemetry and Jaeger

    • Automatic trace context propagation across HTTP and gRPC services
    • SQL query tracing with automatic instrumentation
    • Redis operation tracing with custom hooks
    • Cross-service trace correlation
  • Metrics Collection: Comprehensive metrics collection and visualization

    • Prometheus-native metrics for Go runtime (memory, goroutines, GC, etc.)
    • Custom business metrics (request count, latency histograms, error rates)
    • OpenTelemetry metrics export to OTLP collectors
    • Real-time metrics visualization in Grafana
  • Log Aggregation: Centralized logging with ELK stack

    • Structured logging with automatic trace correlation
    • Log rotation and retention management
    • Filebeat-based log shipping
    • Logstash processing and filtering
    • Kibana-based log search and analysis
  • Alerting System: Intelligent alerting and notification

    • Prometheus AlertManager integration
    • Grafana alert rules for metrics
    • Logstash alert webhooks for log-based alerts
    • Feishu (Lark) notification integration
    • Live probe monitoring for service health
  • Automatic Profiling: Performance profiling with Holmes

    • Automatic CPU profiling on high CPU usage
    • Memory profiling on memory leaks or spikes
    • Goroutine leak detection and profiling
    • Integration with gops for runtime inspection

Infrastructure Features

  • Database Instrumentation: Automatic MySQL query tracking

    • SQL query parsing and normalization
    • Query execution time tracking
    • Trace context injection into database operations
    • Connection pool metrics
  • Cache Instrumentation: Redis operation monitoring

    • Automatic Redis command tracing
    • Operation latency tracking
    • Rate limiting with Redis
  • HTTP Server: Production-ready HTTP server with built-in observability

    • Automatic request tracing
    • Request/response metrics
    • Panic recovery with trace context
    • Health check endpoints
    • Prometheus metrics endpoint
  • gRPC Server/Client: Full gRPC observability

    • Automatic client and server-side tracing
    • Metadata propagation for trace context
    • Peer service identification
    • Error tracking with stack traces

Architecture

The system follows a microservices architecture with the following components:

┌─────────────────────────────────────────────────────────────────┐
│                          Client Layer                             │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      API Gateway Layer                            │
│                      (ordersvc HTTP)                              │
└─────────────────────────────────────────────────────────────────┘
                              │
          ┌───────────────────┼───────────────────┐
          ▼                   ▼                   ▼
┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐
│   usersvc        │  │    skusvc        │  │   ordersvc       │
│   (gRPC:8082)    │  │   (gRPC:8081)    │  │   (HTTP:8080)    │
│   User Service   │  │   SKU Service    │  │   Order Service  │
└──────────────────┘  └──────────────────┘  └──────────────────┘
          │                   │                   │
          └───────────────────┴───────────────────┘
                              │
          ┌───────────────────┼───────────────────┐
          ▼                   ▼                   ▼
┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐
│   MySQL          │  │    Redis         │  │   ffalarm        │
│   (3306)         │  │    (6379)        │  │   (HTTP:8083)    │
└──────────────────┘  └──────────────────┘  └──────────────────┘
                                                      │
┌─────────────────────────────────────────────────────────────────┐
│                    Observability Stack                            │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌──────────────────────┐   │
│  │  Jaeger     │  │ Prometheus  │  │  Elasticsearch       │   │
│  │  (16686)    │  │  (9090)     │  │  (9200)              │   │
│  └─────────────┘  └─────────────┘  └──────────────────────┘   │
│  ┌─────────────┐  ┌─────────────┐  ┌──────────────────────┐   │
│  │  Grafana    │  │ OTEL        │  │  Kibana              │   │
│  │  (3000)     │  │ Collector   │  │  (5601)              │   │
│  └─────────────┘  │ (4317/4318) │  └──────────────────────┘   │
│                   └─────────────┘  ┌──────────────────────┐   │
│                                     │  Logstash (5044)     │   │
│                                     └──────────────────────┘   │
│                                     ┌──────────────────────┐   │
│                                     │  Filebeat            │   │
│                                     └──────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

Data Flow

  1. Request Flow: Client → ordersvc (HTTP) → usersvc/skusvc (gRPC) → Database/Redis
  2. Trace Flow: All Services → OpenTelemetry Collector → Jaeger
  3. Metrics Flow: All Services → Prometheus ← Grafana
  4. Logs Flow: All Services → Filebeat → Logstash → Elasticsearch → Kibana
  5. Alerts Flow: Prometheus/Logstash → ffalarm → Feishu

Project Structure

ffapm/
├── ffapm/                    # Core APM library
│   ├── apm.go               # OpenTelemetry setup (tracer & meter providers)
│   ├── infra.go             # Infrastructure initialization (DB, Redis, APM)
│   ├── http.go              # HTTP server with auto-instrumentation
│   ├── grpcserver.go        # gRPC server with interceptors
│   ├── grpcclient.go        # gRPC client with interceptors
│   ├── sql_driver_wrapper.go # SQL driver wrapper for tracing
│   ├── redis_hook.go        # Redis hook for operation tracing
│   ├── metrics.go           # Custom metrics definitions
│   ├── log.go               # Structured logging with trace context
│   └── build_info.go        # Build and deployment information
│
├── ordersvc/                # Order microservice
│   ├── main.go              # Service entry point
│   ├── api/                 # HTTP API handlers
│   │   └── order.go         # Order creation endpoint
│   ├── grpcclient/          # gRPC client connections
│   │   └── client.go        # User and SKU service clients
│   └── metric/              # Custom business metrics
│       └── metric.go        # Order-specific metrics
│
├── usersvc/                 # User microservice
│   ├── main.go              # Service entry point
│   ├── dao/                 # Data access layer
│   │   └── user.go          # User database operations
│   └── grpc/                # gRPC service implementation
│       └── userserver.go    # User service gRPC handlers
│
├── skusvc/                  # SKU microservice
│   ├── main.go              # Service entry point
│   ├── dao/                 # Data access layer
│   │   └── sku.go           # SKU database operations
│   └── grpc/                # gRPC service implementation
│       └── skuserver.go     # SKU service gRPC handlers
│
├── ffalarm/                 # Alert service
│   ├── main.go              # Service entry point
│   ├── api/                 # Webhook handlers
│   │   └── alarm.go         # Metric and log alert endpoints
│   ├── model/               # Alert data models
│   │   ├── grafana.go       # Grafana alert model
│   │   └── logstash.go      # Logstash alert model
│   ├── notice/              # Notification handlers
│   │   └── notice.go        # Notification dispatch logic
│   ├── thirdparty/          # Third-party integrations
│   │   └── feishu/          # Feishu (Lark) integration
│   ├── liveprobe/           # Service health monitoring
│   │   └── live_probe.go    # HTTP probe checker
│   └── dao/                 # Deployment info data access
│       └── deploy_info.go   # Service deployment configuration
│
├── protos/                  # Protocol Buffer definitions
│   ├── hello.proto          # Hello service definition
│   ├── user.proto           # User service definition
│   ├── sku.proto            # SKU service definition
│   └── *.pb.go              # Generated Go code
│
├── conf/                    # Configuration files
│   ├── docker-compose.yml   # Docker Compose orchestration
│   ├── otel-collector.yaml  # OpenTelemetry Collector config
│   ├── prometheus.yaml      # Prometheus scrape config
│   ├── logstash.conf        # Logstash pipeline config
│   ├── filebeat.yml         # Filebeat input config
│   ├── init.sql             # Database initialization
│   ├── script/              # Service startup scripts
│   └── build/               # Compiled binaries
│
├── pprofdemo/               # Performance profiling examples
│   ├── cpu/                 # CPU profiling example
│   ├── memory/              # Memory profiling example
│   ├── goroutine/           # Goroutine profiling example
│   ├── block/               # Block profiling example
│   ├── mutex/               # Mutex profiling example
│   └── httppprof/           # HTTP pprof server example
│
├── tracedemo/               # Distributed tracing example
│   └── main.go              # Standalone tracing demo
│
├── holmesdemo/              # Holmes auto-profiling example
│   └── main.go              # Auto-profiling demo
│
├── http/                    # HTTP request examples
│   ├── order/               # Order API requests
│   └── alarm/               # Alarm API requests
│
├── Makefile                 # Build and deployment commands
├── go.work                  # Go workspace configuration
└── README.md                # This file

Prerequisites

System Requirements

  • Operating System: Linux, macOS, or Windows with WSL2
  • Go: Version 1.24 or higher
  • Docker: Version 20.x or higher
  • Docker Compose: Version 2.x or higher
  • Storage: At least 10GB free disk space for Docker volumes
  • Memory: Minimum 8GB RAM (16GB recommended)

Development Tools

  • Protocol Buffer Compiler (protoc) for gRPC code generation
  • Make utility for build commands
  • curl or HTTPie for API testing

Quick Start

1. Clone the Repository

git clone https://github.com/hedon-go-road/go-apm.git ffapm
cd ffapm

2. Initialize Database

The database will be automatically initialized when starting Docker Compose, but you can also manually initialize it:

# The init.sql will be executed automatically by docker-compose
# Or manually:
mysql -h 127.0.0.1 -P 3306 -u root -p123456 < conf/init.sql

3. Start All Services

Start the infrastructure and all microservices with Docker Compose:

make docker-up

This command will:

  1. Clean previous builds
  2. Build all services for Linux ARM64
  3. Start all Docker containers including:
    • MySQL database
    • Redis cache
    • Elasticsearch cluster (3 nodes)
    • Kibana
    • Logstash
    • Filebeat
    • Prometheus
    • Grafana
    • Jaeger
    • OpenTelemetry Collector
    • All microservices (usersvc, skusvc, ordersvc, ffalarm)

4. Verify Services

Check if all services are running:

docker compose -f conf/docker-compose.yml ps

All services should be in "Up" state.

5. Send Test Request

Test the order service:

curl "http://127.0.0.1:8080/order/add?uid=1&sku_id=3&num=1"

Expected response:

{
  "code": 0,
  "msg": "success",
  "data": {
    "order_id": "..."
  }
}

6. Access Monitoring Dashboards

Open these URLs in your browser:

7. Stop Services

make docker-down

Services

Core Microservices

Order Service (ordersvc)

Purpose: Handles order creation and management, serves as the API gateway.

Endpoints:

  • POST /order/add?uid={uid}&sku_id={sku_id}&num={num} - Create a new order

Dependencies:

  • usersvc (gRPC) - User information validation
  • skusvc (gRPC) - SKU information and inventory check
  • MySQL - Order persistence
  • Redis - Caching (optional)

Ports:

  • HTTP: 8080
  • Metrics: 8080/metrics
  • Health: 8080/health

Features:

  • Full distributed tracing
  • Custom business metrics (order count, order value)
  • Automatic pprof profiling
  • Panic recovery with logging

User Service (usersvc)

Purpose: Manages user information and authentication.

Endpoints:

  • gRPC: GetUser(id) - Retrieve user information

Dependencies:

  • MySQL - User data storage
  • Redis - User session cache

Ports:

  • gRPC: 8082
  • HTTP: 30002 (metrics & health)

Features:

  • High-performance gRPC server
  • Redis caching with trace context
  • Database query optimization

SKU Service (skusvc)

Purpose: Manages product SKU information and inventory.

Endpoints:

  • gRPC: GetSku(id) - Retrieve SKU information
  • gRPC: UpdateSkuNum(id, num) - Update SKU inventory

Dependencies:

  • MySQL - SKU data storage

Ports:

  • gRPC: 8081
  • HTTP: 30001 (metrics & health)

Features:

  • Inventory management
  • Transaction support for inventory updates
  • Real-time inventory tracking

Alert Service (ffalarm)

Purpose: Centralized alerting and notification service.

Endpoints:

  • POST /metric_webhook - Receive metric-based alerts from Grafana
  • POST /log_webhook - Receive log-based alerts from Logstash

Dependencies:

  • MySQL - Deployment information storage
  • Feishu API - Notification delivery

Ports:

  • HTTP: 8083

Features:

  • Grafana alert webhook handler
  • Logstash alert webhook handler
  • Feishu (Lark) rich notification
  • Service live probe monitoring
  • Alert deduplication and throttling

Monitoring Stack

Distributed Tracing (Jaeger)

Access: http://127.0.0.1:16686

Features:

  • View request traces across all services
  • Analyze service dependencies and call graphs
  • Identify slow operations and bottlenecks
  • Error tracking with stack traces

Usage:

  1. Open Jaeger UI
  2. Select service: "ordersvc", "usersvc", or "skusvc"
  3. Click "Find Traces"
  4. Inspect individual traces to see the complete request flow

Metrics & Dashboards (Prometheus + Grafana)

Prometheus: http://127.0.0.1:9090 Grafana: http://127.0.0.1:3000

Available Metrics:

  • System Metrics:

    • go_goroutines - Number of goroutines
    • go_memstats_alloc_bytes - Memory allocation
    • go_gc_duration_seconds - GC duration
  • HTTP Metrics:

    • ffapm_server_handle_total - Request count by endpoint
    • ffapm_server_handle_seconds - Request latency histogram
    • process_cpu_seconds_total - CPU usage
  • gRPC Metrics:

    • ffapm_server_handle_total{type="grpc"} - RPC call count
    • ffapm_server_handle_seconds{type="grpc"} - RPC latency
  • Business Metrics:

    • Custom metrics defined in each service's metric/ package

Grafana Setup:

  1. Login with default credentials (admin/admin)
  2. Add Prometheus data source: http://ffapm-prometheus:9090
  3. Add Jaeger data source: http://ffamp-jaeger:16686
  4. Import dashboards or create custom ones

Recommended Dashboards:

  • Go Processes (ID: 6671)
  • Node Exporter Full (ID: 1860)
  • gRPC Dashboard (ID: 14827)

Log Aggregation (ELK Stack)

Kibana: http://127.0.0.1:5601 Elasticsearch: http://127.0.0.1:9200

Log Pipeline:

  1. Services write logs to /logs/*.log
  2. Filebeat collects and ships logs to Logstash
  3. Logstash parses and enriches logs
  4. Logs stored in Elasticsearch
  5. Kibana provides search and visualization

Log Format: All logs include trace context for correlation:

  • trace_id - Distributed trace ID
  • span_id - Current span ID
  • app - Service name
  • level - Log level (info, error, warn)
  • msg - Log message
  • Additional custom fields

Kibana Usage:

  1. Open Kibana
  2. Go to "Discover"
  3. Create index pattern: logstash-*
  4. Search logs using KQL (Kibana Query Language)
  5. Filter by app, trace_id, or any custom field

Example Queries:

# Find all errors in ordersvc
app: "ordersvc" AND level: "error"

# Find all logs for a specific trace
trace_id: "abc123..."

# Find slow queries
duration > 1000

Alert Configuration

Prometheus Alerts: Configure in conf/prometheus.yaml and Grafana UI.

Example Alert Rules:

# High error rate
- alert: HighErrorRate
  expr: rate(ffapm_server_handle_total{status=~"5.."}[5m]) > 0.05
  for: 5m
  annotations:
    summary: "High error rate detected"
    
# High latency
- alert: HighLatency
  expr: histogram_quantile(0.95, rate(ffapm_server_handle_seconds_bucket[5m])) > 1
  for: 5m
  annotations:
    summary: "High latency detected"

Logstash Alerts: Configure in conf/logstash.conf using filters and HTTP output to ffalarm.

Development

Local Development Setup

For local development without Docker:

# 1. Start infrastructure with Docker
docker compose -f conf/docker-compose.yml up -d ffapm-db ffapm-redis ffapm-otel-collector ffamp-jaeger ffapm-prometheus

# 2. Initialize database
mysql -h 127.0.0.1 -P 3306 -u root -p123456 < conf/init.sql

# 3. Run services locally
make setup

# This will start all services in background
# Check status
make status

# View logs
tail -f logs/ordersvc.log
tail -f logs/usersvc.log
tail -f logs/skusvc.log
tail -f logs/ffalarm.log

# Stop services
make stop

Building Services

# Build for current OS
go build -o bin/ordersvc ordersvc/main.go
go build -o bin/usersvc usersvc/main.go
go build -o bin/skusvc skusvc/main.go
go build -o bin/ffalarm ffalarm/main.go

# Build for Linux ARM64 (for Docker)
make build-ubuntu

# Clean build artifacts
make clear-build

Generating Protocol Buffers

# Install protoc compiler
# macOS: brew install protobuf
# Linux: apt-get install protobuf-compiler

# Install Go plugins
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest

# Generate code
make genpb

Running Tests

# Run all tests
go test ./...

# Run tests for specific package
go test ./ffapm/...

# Run with coverage
go test -cover ./...

# Run with race detector
go test -race ./...

Load Testing

# Run Apache Bench load test
make ab

# Or manually with ab
ab -n 10000 -c 100 "http://127.0.0.1:8080/order/add?uid=1&sku_id=3&num=1"

# Or use hey (better tool)
hey -n 10000 -c 100 "http://127.0.0.1:8080/order/add?uid=1&sku_id=3&num=1"

Performance Profiling

Using pprof (built-in):

All services expose pprof endpoints:

# CPU profiling
go tool pprof http://localhost:8080/debug/pprof/profile?seconds=30

# Memory profiling
go tool pprof http://localhost:8080/debug/pprof/heap

# Goroutine profiling
go tool pprof http://localhost:8080/debug/pprof/goroutine

# Interactive web UI
go tool pprof -http=:8081 http://localhost:8080/debug/pprof/profile?seconds=30

Using gops (runtime inspection):

All services with auto-profiling enabled support gops:

# Install gops
go install github.com/google/gops@latest

# List all Go processes
gops

# Attach to process
gops stack <PID>
gops memstats <PID>
gops gc <PID>

Using Holmes (automatic profiling):

Holmes automatically generates profiles when:

  • CPU usage > 80% for 10 seconds
  • Memory usage > 80% of limit
  • Goroutine count > 10,000

Profiles are stored in the service working directory and logged via the APM system.

Configuration

Environment Variables

Services support configuration via environment variables:

# Service name (required)
export APP_NAME=ordersvc

# Database connection
export MYSQL_HOST=127.0.0.1
export MYSQL_PORT=3306
export MYSQL_USER=root
export MYSQL_PASSWORD=123456

# Redis connection
export REDIS_HOST=127.0.0.1
export REDIS_PORT=6379

# OpenTelemetry Collector
export OTEL_ENDPOINT=localhost:4317

# Log configuration
export LOG_PATH=/logs
export LOG_MAX_AGE=7

APM Initialization Options

The ffapm library supports various initialization options:

ffapm.Infra.Init(
    // MySQL connection
    ffapm.WithMysql("root:password@tcp(host:3306)/dbname?charset=utf8mb4&parseTime=True&loc=Local"),
    
    // Redis connection
    ffapm.WithRedis("host:6379"),
    
    // Enable APM (tracing, metrics, logging)
    ffapm.WithEnableApm(
        "otel-collector:4317",  // OTLP endpoint
        "/logs",                 // Log path prefix
        7,                       // Log retention days
        customMetricFunc,        // Optional: custom metric init
    ),
    
    // Enable automatic profiling
    ffapm.WithAutoPProf(&ffapm.AutoPProfOpt{
        EnableCPU:       true,
        EnableMem:       true,
        EnableGoroutine: true,
    }),
)

Service Configuration

HTTP Server:

server := ffapm.NewHTTPServer(":8080",
    ffapm.WithHttpMetrics(),    // Enable Prometheus metrics endpoint
    ffapm.WithHeartbeat(),      // Enable health check endpoint
)
server.HandleFunc("/api/endpoint", handlerFunc)

gRPC Server:

server := ffapm.NewGrpcServer(":8081")
protos.RegisterYourServiceServer(server.Server, &YourServiceImpl{})

gRPC Client:

conn := ffapm.NewGrpcClient("target-service:8081", "target-service-name")
client := protos.NewYourServiceClient(conn)

API Endpoints

Order Service

Create Order

POST /order/add?uid={uid}&sku_id={sku_id}&num={num}

# Example
curl "http://127.0.0.1:8080/order/add?uid=1&sku_id=3&num=2"

# Response
{
  "code": 0,
  "msg": "success",
  "data": {
    "order_id": "1234567890"
  }
}

Alert Service

Metric Webhook (Grafana)

POST /metric_webhook
Content-Type: application/json

{
  "alerts": [...]
}

Log Webhook (Logstash)

POST /log_webhook
Content-Type: application/json

{
  "message": "...",
  "level": "error"
}

Health Checks

All services expose health check endpoints:

# Check service health
curl http://localhost:8080/health
# Response: ok

# Check metrics
curl http://localhost:8080/metrics

Demo Projects

The project includes several demonstration projects showcasing different observability features:

pprofdemo - Performance Profiling Examples

Located in /pprofdemo, this package demonstrates various profiling techniques:

  • cpu: CPU profiling with pprof.StartCPUProfile()
  • memory: Memory profiling with pprof.WriteHeapProfile()
  • goroutine: Goroutine profiling to detect leaks
  • block: Block profiling for lock contention
  • mutex: Mutex profiling for lock hold time
  • thread: Thread creation profiling
  • httppprof: HTTP server with /debug/pprof endpoints

Running Examples:

cd pprofdemo

# CPU profiling example
go run cpu/main.go

# Memory profiling example
go run memory/main.go

# HTTP pprof server
go run httppprof/main.go
# Then access: http://localhost:6060/debug/pprof

tracedemo - Distributed Tracing Example

Located in /tracedemo, demonstrates standalone OpenTelemetry tracing:

cd tracedemo

# Start infrastructure
docker compose -f compose/docker-compose.yml up -d

# Run demo
go run main.go

# View traces in Jaeger: http://localhost:16686

holmesdemo - Automatic Profiling Example

Located in /holmesdemo, demonstrates Holmes automatic profiling:

cd holmesdemo

# Run demo (will auto-profile on high resource usage)
go run main.go

Holmes will automatically generate profiles when thresholds are exceeded.

Contributing

Contributions are welcome! Please follow these guidelines:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Commit your changes: git commit -m 'Add amazing feature'
  4. Push to the branch: git push origin feature/amazing-feature
  5. Open a Pull Request

Code Style

  • Follow standard Go conventions and use gofmt
  • Add comments for exported functions and types
  • Write tests for new features
  • Update documentation for API changes

Testing

Ensure all tests pass before submitting PR:

go test ./...
go vet ./...
golangci-lint run

References

OpenTelemetry

Monitoring Tools

Performance Tools

Related Projects

Learning Resources

License

This project is for educational and demonstration purposes. Please check the original repository for license information.

Support

For questions and support:

  • Open an issue on GitHub
  • Check existing documentation
  • Review demo projects for examples

Built with ❤️ using Go and OpenTelemetry

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published