中文文档 | English
- Overview
- Features
- Architecture
- Project Structure
- Prerequisites
- Quick Start
- Services
- Monitoring Stack
- Development
- Configuration
- API Endpoints
- Demo Projects
- Contributing
- References
FFAPM is a comprehensive Application Performance Monitoring (APM) system built with Go, designed to provide full-stack observability for microservices. It integrates modern observability tools including OpenTelemetry, Prometheus, Grafana, Jaeger, and the ELK stack to deliver distributed tracing, metrics collection, log aggregation, and alerting capabilities.
The project demonstrates best practices for building observable microservices in Go, featuring automatic instrumentation for HTTP and gRPC services, database queries, and Redis operations.
-
Distributed Tracing: Full distributed tracing support using OpenTelemetry and Jaeger
- Automatic trace context propagation across HTTP and gRPC services
- SQL query tracing with automatic instrumentation
- Redis operation tracing with custom hooks
- Cross-service trace correlation
-
Metrics Collection: Comprehensive metrics collection and visualization
- Prometheus-native metrics for Go runtime (memory, goroutines, GC, etc.)
- Custom business metrics (request count, latency histograms, error rates)
- OpenTelemetry metrics export to OTLP collectors
- Real-time metrics visualization in Grafana
-
Log Aggregation: Centralized logging with ELK stack
- Structured logging with automatic trace correlation
- Log rotation and retention management
- Filebeat-based log shipping
- Logstash processing and filtering
- Kibana-based log search and analysis
-
Alerting System: Intelligent alerting and notification
- Prometheus AlertManager integration
- Grafana alert rules for metrics
- Logstash alert webhooks for log-based alerts
- Feishu (Lark) notification integration
- Live probe monitoring for service health
-
Automatic Profiling: Performance profiling with Holmes
- Automatic CPU profiling on high CPU usage
- Memory profiling on memory leaks or spikes
- Goroutine leak detection and profiling
- Integration with gops for runtime inspection
-
Database Instrumentation: Automatic MySQL query tracking
- SQL query parsing and normalization
- Query execution time tracking
- Trace context injection into database operations
- Connection pool metrics
-
Cache Instrumentation: Redis operation monitoring
- Automatic Redis command tracing
- Operation latency tracking
- Rate limiting with Redis
-
HTTP Server: Production-ready HTTP server with built-in observability
- Automatic request tracing
- Request/response metrics
- Panic recovery with trace context
- Health check endpoints
- Prometheus metrics endpoint
-
gRPC Server/Client: Full gRPC observability
- Automatic client and server-side tracing
- Metadata propagation for trace context
- Peer service identification
- Error tracking with stack traces
The system follows a microservices architecture with the following components:
┌─────────────────────────────────────────────────────────────────┐
│ Client Layer │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ API Gateway Layer │
│ (ordersvc HTTP) │
└─────────────────────────────────────────────────────────────────┘
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ usersvc │ │ skusvc │ │ ordersvc │
│ (gRPC:8082) │ │ (gRPC:8081) │ │ (HTTP:8080) │
│ User Service │ │ SKU Service │ │ Order Service │
└──────────────────┘ └──────────────────┘ └──────────────────┘
│ │ │
└───────────────────┴───────────────────┘
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ MySQL │ │ Redis │ │ ffalarm │
│ (3306) │ │ (6379) │ │ (HTTP:8083) │
└──────────────────┘ └──────────────────┘ └──────────────────┘
│
┌─────────────────────────────────────────────────────────────────┐
│ Observability Stack │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌──────────────────────┐ │
│ │ Jaeger │ │ Prometheus │ │ Elasticsearch │ │
│ │ (16686) │ │ (9090) │ │ (9200) │ │
│ └─────────────┘ └─────────────┘ └──────────────────────┘ │
│ ┌─────────────┐ ┌─────────────┐ ┌──────────────────────┐ │
│ │ Grafana │ │ OTEL │ │ Kibana │ │
│ │ (3000) │ │ Collector │ │ (5601) │ │
│ └─────────────┘ │ (4317/4318) │ └──────────────────────┘ │
│ └─────────────┘ ┌──────────────────────┐ │
│ │ Logstash (5044) │ │
│ └──────────────────────┘ │
│ ┌──────────────────────┐ │
│ │ Filebeat │ │
│ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
- Request Flow: Client → ordersvc (HTTP) → usersvc/skusvc (gRPC) → Database/Redis
- Trace Flow: All Services → OpenTelemetry Collector → Jaeger
- Metrics Flow: All Services → Prometheus ← Grafana
- Logs Flow: All Services → Filebeat → Logstash → Elasticsearch → Kibana
- Alerts Flow: Prometheus/Logstash → ffalarm → Feishu
ffapm/
├── ffapm/ # Core APM library
│ ├── apm.go # OpenTelemetry setup (tracer & meter providers)
│ ├── infra.go # Infrastructure initialization (DB, Redis, APM)
│ ├── http.go # HTTP server with auto-instrumentation
│ ├── grpcserver.go # gRPC server with interceptors
│ ├── grpcclient.go # gRPC client with interceptors
│ ├── sql_driver_wrapper.go # SQL driver wrapper for tracing
│ ├── redis_hook.go # Redis hook for operation tracing
│ ├── metrics.go # Custom metrics definitions
│ ├── log.go # Structured logging with trace context
│ └── build_info.go # Build and deployment information
│
├── ordersvc/ # Order microservice
│ ├── main.go # Service entry point
│ ├── api/ # HTTP API handlers
│ │ └── order.go # Order creation endpoint
│ ├── grpcclient/ # gRPC client connections
│ │ └── client.go # User and SKU service clients
│ └── metric/ # Custom business metrics
│ └── metric.go # Order-specific metrics
│
├── usersvc/ # User microservice
│ ├── main.go # Service entry point
│ ├── dao/ # Data access layer
│ │ └── user.go # User database operations
│ └── grpc/ # gRPC service implementation
│ └── userserver.go # User service gRPC handlers
│
├── skusvc/ # SKU microservice
│ ├── main.go # Service entry point
│ ├── dao/ # Data access layer
│ │ └── sku.go # SKU database operations
│ └── grpc/ # gRPC service implementation
│ └── skuserver.go # SKU service gRPC handlers
│
├── ffalarm/ # Alert service
│ ├── main.go # Service entry point
│ ├── api/ # Webhook handlers
│ │ └── alarm.go # Metric and log alert endpoints
│ ├── model/ # Alert data models
│ │ ├── grafana.go # Grafana alert model
│ │ └── logstash.go # Logstash alert model
│ ├── notice/ # Notification handlers
│ │ └── notice.go # Notification dispatch logic
│ ├── thirdparty/ # Third-party integrations
│ │ └── feishu/ # Feishu (Lark) integration
│ ├── liveprobe/ # Service health monitoring
│ │ └── live_probe.go # HTTP probe checker
│ └── dao/ # Deployment info data access
│ └── deploy_info.go # Service deployment configuration
│
├── protos/ # Protocol Buffer definitions
│ ├── hello.proto # Hello service definition
│ ├── user.proto # User service definition
│ ├── sku.proto # SKU service definition
│ └── *.pb.go # Generated Go code
│
├── conf/ # Configuration files
│ ├── docker-compose.yml # Docker Compose orchestration
│ ├── otel-collector.yaml # OpenTelemetry Collector config
│ ├── prometheus.yaml # Prometheus scrape config
│ ├── logstash.conf # Logstash pipeline config
│ ├── filebeat.yml # Filebeat input config
│ ├── init.sql # Database initialization
│ ├── script/ # Service startup scripts
│ └── build/ # Compiled binaries
│
├── pprofdemo/ # Performance profiling examples
│ ├── cpu/ # CPU profiling example
│ ├── memory/ # Memory profiling example
│ ├── goroutine/ # Goroutine profiling example
│ ├── block/ # Block profiling example
│ ├── mutex/ # Mutex profiling example
│ └── httppprof/ # HTTP pprof server example
│
├── tracedemo/ # Distributed tracing example
│ └── main.go # Standalone tracing demo
│
├── holmesdemo/ # Holmes auto-profiling example
│ └── main.go # Auto-profiling demo
│
├── http/ # HTTP request examples
│ ├── order/ # Order API requests
│ └── alarm/ # Alarm API requests
│
├── Makefile # Build and deployment commands
├── go.work # Go workspace configuration
└── README.md # This file
- Operating System: Linux, macOS, or Windows with WSL2
- Go: Version 1.24 or higher
- Docker: Version 20.x or higher
- Docker Compose: Version 2.x or higher
- Storage: At least 10GB free disk space for Docker volumes
- Memory: Minimum 8GB RAM (16GB recommended)
- Protocol Buffer Compiler (protoc) for gRPC code generation
- Make utility for build commands
- curl or HTTPie for API testing
git clone https://github.com/hedon-go-road/go-apm.git ffapm
cd ffapmThe database will be automatically initialized when starting Docker Compose, but you can also manually initialize it:
# The init.sql will be executed automatically by docker-compose
# Or manually:
mysql -h 127.0.0.1 -P 3306 -u root -p123456 < conf/init.sqlStart the infrastructure and all microservices with Docker Compose:
make docker-upThis command will:
- Clean previous builds
- Build all services for Linux ARM64
- Start all Docker containers including:
- MySQL database
- Redis cache
- Elasticsearch cluster (3 nodes)
- Kibana
- Logstash
- Filebeat
- Prometheus
- Grafana
- Jaeger
- OpenTelemetry Collector
- All microservices (usersvc, skusvc, ordersvc, ffalarm)
Check if all services are running:
docker compose -f conf/docker-compose.yml psAll services should be in "Up" state.
Test the order service:
curl "http://127.0.0.1:8080/order/add?uid=1&sku_id=3&num=1"Expected response:
{
"code": 0,
"msg": "success",
"data": {
"order_id": "..."
}
}Open these URLs in your browser:
- Jaeger UI (Distributed Tracing): http://127.0.0.1:16686
- Grafana (Metrics & Dashboards): http://127.0.0.1:3000
- Prometheus (Metrics Storage): http://127.0.0.1:9090
- Kibana (Log Analysis): http://127.0.0.1:5601
make docker-downPurpose: Handles order creation and management, serves as the API gateway.
Endpoints:
POST /order/add?uid={uid}&sku_id={sku_id}&num={num}- Create a new order
Dependencies:
- usersvc (gRPC) - User information validation
- skusvc (gRPC) - SKU information and inventory check
- MySQL - Order persistence
- Redis - Caching (optional)
Ports:
- HTTP: 8080
- Metrics: 8080/metrics
- Health: 8080/health
Features:
- Full distributed tracing
- Custom business metrics (order count, order value)
- Automatic pprof profiling
- Panic recovery with logging
Purpose: Manages user information and authentication.
Endpoints:
- gRPC:
GetUser(id)- Retrieve user information
Dependencies:
- MySQL - User data storage
- Redis - User session cache
Ports:
- gRPC: 8082
- HTTP: 30002 (metrics & health)
Features:
- High-performance gRPC server
- Redis caching with trace context
- Database query optimization
Purpose: Manages product SKU information and inventory.
Endpoints:
- gRPC:
GetSku(id)- Retrieve SKU information - gRPC:
UpdateSkuNum(id, num)- Update SKU inventory
Dependencies:
- MySQL - SKU data storage
Ports:
- gRPC: 8081
- HTTP: 30001 (metrics & health)
Features:
- Inventory management
- Transaction support for inventory updates
- Real-time inventory tracking
Purpose: Centralized alerting and notification service.
Endpoints:
POST /metric_webhook- Receive metric-based alerts from GrafanaPOST /log_webhook- Receive log-based alerts from Logstash
Dependencies:
- MySQL - Deployment information storage
- Feishu API - Notification delivery
Ports:
- HTTP: 8083
Features:
- Grafana alert webhook handler
- Logstash alert webhook handler
- Feishu (Lark) rich notification
- Service live probe monitoring
- Alert deduplication and throttling
Access: http://127.0.0.1:16686
Features:
- View request traces across all services
- Analyze service dependencies and call graphs
- Identify slow operations and bottlenecks
- Error tracking with stack traces
Usage:
- Open Jaeger UI
- Select service: "ordersvc", "usersvc", or "skusvc"
- Click "Find Traces"
- Inspect individual traces to see the complete request flow
Prometheus: http://127.0.0.1:9090 Grafana: http://127.0.0.1:3000
Available Metrics:
-
System Metrics:
go_goroutines- Number of goroutinesgo_memstats_alloc_bytes- Memory allocationgo_gc_duration_seconds- GC duration
-
HTTP Metrics:
ffapm_server_handle_total- Request count by endpointffapm_server_handle_seconds- Request latency histogramprocess_cpu_seconds_total- CPU usage
-
gRPC Metrics:
ffapm_server_handle_total{type="grpc"}- RPC call countffapm_server_handle_seconds{type="grpc"}- RPC latency
-
Business Metrics:
- Custom metrics defined in each service's
metric/package
- Custom metrics defined in each service's
Grafana Setup:
- Login with default credentials (admin/admin)
- Add Prometheus data source: http://ffapm-prometheus:9090
- Add Jaeger data source: http://ffamp-jaeger:16686
- Import dashboards or create custom ones
Recommended Dashboards:
- Go Processes (ID: 6671)
- Node Exporter Full (ID: 1860)
- gRPC Dashboard (ID: 14827)
Kibana: http://127.0.0.1:5601 Elasticsearch: http://127.0.0.1:9200
Log Pipeline:
- Services write logs to
/logs/*.log - Filebeat collects and ships logs to Logstash
- Logstash parses and enriches logs
- Logs stored in Elasticsearch
- Kibana provides search and visualization
Log Format: All logs include trace context for correlation:
trace_id- Distributed trace IDspan_id- Current span IDapp- Service namelevel- Log level (info, error, warn)msg- Log message- Additional custom fields
Kibana Usage:
- Open Kibana
- Go to "Discover"
- Create index pattern:
logstash-* - Search logs using KQL (Kibana Query Language)
- Filter by
app,trace_id, or any custom field
Example Queries:
# Find all errors in ordersvc
app: "ordersvc" AND level: "error"
# Find all logs for a specific trace
trace_id: "abc123..."
# Find slow queries
duration > 1000
Prometheus Alerts:
Configure in conf/prometheus.yaml and Grafana UI.
Example Alert Rules:
# High error rate
- alert: HighErrorRate
expr: rate(ffapm_server_handle_total{status=~"5.."}[5m]) > 0.05
for: 5m
annotations:
summary: "High error rate detected"
# High latency
- alert: HighLatency
expr: histogram_quantile(0.95, rate(ffapm_server_handle_seconds_bucket[5m])) > 1
for: 5m
annotations:
summary: "High latency detected"Logstash Alerts:
Configure in conf/logstash.conf using filters and HTTP output to ffalarm.
For local development without Docker:
# 1. Start infrastructure with Docker
docker compose -f conf/docker-compose.yml up -d ffapm-db ffapm-redis ffapm-otel-collector ffamp-jaeger ffapm-prometheus
# 2. Initialize database
mysql -h 127.0.0.1 -P 3306 -u root -p123456 < conf/init.sql
# 3. Run services locally
make setup
# This will start all services in background
# Check status
make status
# View logs
tail -f logs/ordersvc.log
tail -f logs/usersvc.log
tail -f logs/skusvc.log
tail -f logs/ffalarm.log
# Stop services
make stop# Build for current OS
go build -o bin/ordersvc ordersvc/main.go
go build -o bin/usersvc usersvc/main.go
go build -o bin/skusvc skusvc/main.go
go build -o bin/ffalarm ffalarm/main.go
# Build for Linux ARM64 (for Docker)
make build-ubuntu
# Clean build artifacts
make clear-build# Install protoc compiler
# macOS: brew install protobuf
# Linux: apt-get install protobuf-compiler
# Install Go plugins
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
# Generate code
make genpb# Run all tests
go test ./...
# Run tests for specific package
go test ./ffapm/...
# Run with coverage
go test -cover ./...
# Run with race detector
go test -race ./...# Run Apache Bench load test
make ab
# Or manually with ab
ab -n 10000 -c 100 "http://127.0.0.1:8080/order/add?uid=1&sku_id=3&num=1"
# Or use hey (better tool)
hey -n 10000 -c 100 "http://127.0.0.1:8080/order/add?uid=1&sku_id=3&num=1"Using pprof (built-in):
All services expose pprof endpoints:
# CPU profiling
go tool pprof http://localhost:8080/debug/pprof/profile?seconds=30
# Memory profiling
go tool pprof http://localhost:8080/debug/pprof/heap
# Goroutine profiling
go tool pprof http://localhost:8080/debug/pprof/goroutine
# Interactive web UI
go tool pprof -http=:8081 http://localhost:8080/debug/pprof/profile?seconds=30Using gops (runtime inspection):
All services with auto-profiling enabled support gops:
# Install gops
go install github.com/google/gops@latest
# List all Go processes
gops
# Attach to process
gops stack <PID>
gops memstats <PID>
gops gc <PID>Using Holmes (automatic profiling):
Holmes automatically generates profiles when:
- CPU usage > 80% for 10 seconds
- Memory usage > 80% of limit
- Goroutine count > 10,000
Profiles are stored in the service working directory and logged via the APM system.
Services support configuration via environment variables:
# Service name (required)
export APP_NAME=ordersvc
# Database connection
export MYSQL_HOST=127.0.0.1
export MYSQL_PORT=3306
export MYSQL_USER=root
export MYSQL_PASSWORD=123456
# Redis connection
export REDIS_HOST=127.0.0.1
export REDIS_PORT=6379
# OpenTelemetry Collector
export OTEL_ENDPOINT=localhost:4317
# Log configuration
export LOG_PATH=/logs
export LOG_MAX_AGE=7The ffapm library supports various initialization options:
ffapm.Infra.Init(
// MySQL connection
ffapm.WithMysql("root:password@tcp(host:3306)/dbname?charset=utf8mb4&parseTime=True&loc=Local"),
// Redis connection
ffapm.WithRedis("host:6379"),
// Enable APM (tracing, metrics, logging)
ffapm.WithEnableApm(
"otel-collector:4317", // OTLP endpoint
"/logs", // Log path prefix
7, // Log retention days
customMetricFunc, // Optional: custom metric init
),
// Enable automatic profiling
ffapm.WithAutoPProf(&ffapm.AutoPProfOpt{
EnableCPU: true,
EnableMem: true,
EnableGoroutine: true,
}),
)HTTP Server:
server := ffapm.NewHTTPServer(":8080",
ffapm.WithHttpMetrics(), // Enable Prometheus metrics endpoint
ffapm.WithHeartbeat(), // Enable health check endpoint
)
server.HandleFunc("/api/endpoint", handlerFunc)gRPC Server:
server := ffapm.NewGrpcServer(":8081")
protos.RegisterYourServiceServer(server.Server, &YourServiceImpl{})gRPC Client:
conn := ffapm.NewGrpcClient("target-service:8081", "target-service-name")
client := protos.NewYourServiceClient(conn)POST /order/add?uid={uid}&sku_id={sku_id}&num={num}
# Example
curl "http://127.0.0.1:8080/order/add?uid=1&sku_id=3&num=2"
# Response
{
"code": 0,
"msg": "success",
"data": {
"order_id": "1234567890"
}
}POST /metric_webhook
Content-Type: application/json
{
"alerts": [...]
}POST /log_webhook
Content-Type: application/json
{
"message": "...",
"level": "error"
}All services expose health check endpoints:
# Check service health
curl http://localhost:8080/health
# Response: ok
# Check metrics
curl http://localhost:8080/metricsThe project includes several demonstration projects showcasing different observability features:
Located in /pprofdemo, this package demonstrates various profiling techniques:
- cpu: CPU profiling with
pprof.StartCPUProfile() - memory: Memory profiling with
pprof.WriteHeapProfile() - goroutine: Goroutine profiling to detect leaks
- block: Block profiling for lock contention
- mutex: Mutex profiling for lock hold time
- thread: Thread creation profiling
- httppprof: HTTP server with
/debug/pprofendpoints
Running Examples:
cd pprofdemo
# CPU profiling example
go run cpu/main.go
# Memory profiling example
go run memory/main.go
# HTTP pprof server
go run httppprof/main.go
# Then access: http://localhost:6060/debug/pprofLocated in /tracedemo, demonstrates standalone OpenTelemetry tracing:
cd tracedemo
# Start infrastructure
docker compose -f compose/docker-compose.yml up -d
# Run demo
go run main.go
# View traces in Jaeger: http://localhost:16686Located in /holmesdemo, demonstrates Holmes automatic profiling:
cd holmesdemo
# Run demo (will auto-profile on high resource usage)
go run main.goHolmes will automatically generate profiles when thresholds are exceeded.
Contributions are welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
- Follow standard Go conventions and use
gofmt - Add comments for exported functions and types
- Write tests for new features
- Update documentation for API changes
Ensure all tests pass before submitting PR:
go test ./...
go vet ./...
golangci-lint run- OpenTelemetry Official Site
- OpenTelemetry Go SDK
- OpenTelemetry Go Contrib
- OpenTelemetry Demo
- Ecosystem Registry
This project is for educational and demonstration purposes. Please check the original repository for license information.
For questions and support:
- Open an issue on GitHub
- Check existing documentation
- Review demo projects for examples
Built with ❤️ using Go and OpenTelemetry