Skip to content

Comments

[features] Integrate Prometheus-based monitoring for FlexKV#106

Merged
peaceforeverCN merged 1 commit intotaco-project:mainfrom
zhou-yukun:monitor-apply
Feb 24, 2026
Merged

[features] Integrate Prometheus-based monitoring for FlexKV#106
peaceforeverCN merged 1 commit intotaco-project:mainfrom
zhou-yukun:monitor-apply

Conversation

@zhou-yukun
Copy link

This PR adds native Prometheus metrics integration to FlexKV, enabling real-time performance monitoring and visualization of KV cache operations. The monitoring framework is embedded at both the Python and C++ runtime layers with a zero-intrusion design — users only need to set environment variables to activate metrics collection at runtime, with no code changes required.

Key features:

  • Enable metrics collection by setting FLEXKV_ENABLE_METRICS=1, no code changes required
  • Prometheus metrics at both Python (prometheus_client) and C++ (prometheus-cpp) runtime layers
  • Cover core metrics including cache hit/miss, memory pool status, block allocation/eviction, and data transfer statistics
  • Provide Docker Compose based monitoring stack (Prometheus + Grafana) with pre-configured dashboards in monitoring/ directory
  • Add comprehensive monitoring documentation at docs/monitoring/README_en.md (English) and docs/monitoring/README_zh.md (Chinese)

Changes include:

  • New MetricsManager singleton in C++ for thread-safe metrics management
  • New FlexKVMetrics module in Python with automatic server lifecycle
  • Unified configuration flow: Python reads env vars and passes to C++ layer
  • Updated CMakeLists.txt to link prometheus-cpp library
  • Added monitoring/ directory with docker-compose, Prometheus config, Grafana provisioning and pre-built dashboard JSON
  • Documentation at docs/monitoring/README_en.md (English) and docs/monitoring/README_zh.md (Chinese)
  • Updated project README and README_zh with monitoring overview section

This PR adds native Prometheus metrics integration to FlexKV, enabling
real-time performance monitoring and visualization of KV cache operations.
The monitoring framework is embedded at both the Python and C++ runtime
layers with a zero-intrusion design — users only need to set environment
variables to activate metrics collection at runtime, with no code changes
required.

Key features:
- Enable metrics collection by setting `FLEXKV_ENABLE_METRICS=1`, no code
  changes required
- Prometheus metrics at both Python (`prometheus_client`) and C++
  (`prometheus-cpp`) runtime layers
- Cover core metrics including cache hit/miss, memory pool status,
  block allocation/eviction, and data transfer statistics
- Provide Docker Compose based monitoring stack (Prometheus + Grafana)
  with pre-configured dashboards in `monitoring/` directory
- Add comprehensive monitoring documentation at
  `docs/monitoring/README_en.md` (English) and
  `docs/monitoring/README_zh.md` (Chinese)

Changes include:
- New `MetricsManager` singleton in C++ for thread-safe metrics management
- New `FlexKVMetrics` module in Python with automatic server lifecycle
- Unified configuration flow: Python reads env vars and passes to C++ layer
- Updated CMakeLists.txt to link `prometheus-cpp` library
- Added `monitoring/` directory with docker-compose, Prometheus config,
  Grafana provisioning and pre-built dashboard JSON
- Documentation at `docs/monitoring/README_en.md` (English)
  and `docs/monitoring/README_zh.md` (Chinese)
- Updated project README and README_zh with monitoring overview section
Copy link
Collaborator

@peaceforeverCN peaceforeverCN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhou-yukun
Copy link
Author

zhou-yukun commented Feb 24, 2026 via email

@peaceforeverCN peaceforeverCN merged commit bd60d80 into taco-project:main Feb 24, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants