-
Notifications
You must be signed in to change notification settings - Fork 193
feat: configurable cardinality limit for otel #2009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes introduce a configurable cardinality limit for metrics collection by adding a new Changes
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (11)
💤 Files with no reviewable changes (1)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (9)
⏰ Context from checks skipped due to timeout of 90000ms (9)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Router image scan passed✅ No security vulnerabilities found in image: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
router/pkg/config/config.go (1)
125-130: Expose env-var name in struct tag commentMinor nit: for new environment variables we typically keep a brief comment for discoverability (several other fields have this). Adding one keeps docs consistent.
CardinalityLimit int `yaml:"cardinality_limit" envDefault:"2000" env:"METRICS_CARDINALITY_LIMIT"` // max unique metric labels per scrape cyclerouter/pkg/config/config.schema.json (1)
975-980: Add an explicit upper bound and example to prevent accidental OOMsGreat to see the limit exposed in the schema. Given that very high cardinality values can blow up memory usage quickly, consider:
- Adding a
"maximum"(e.g. 100 000) to make mis-configurations fail fast.- Providing an
"examples"array so users understand the expected magnitude.- Double-checking that the default
2000is wired through allMetricConfigFromTelemetrycallers; a quick grep should confirm no hard-coded fall-backs remain.These small tweaks tighten validation and improve UX.
router/bench-limited-cardinality.js (1)
22-24: Consider making cardinality configurable.The comment mentions testing both under and over the cardinality limit (300 vs 500 names). Consider making
distinctNamesconfigurable via environment variables to easily test different cardinality scenarios without code changes.-const distinctNames = 300; +const distinctNames = parseInt(__ENV.DISTINCT_NAMES || '300');
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
docker-compose.yml(6 hunks)router/bench-limited-cardinality.js(1 hunks)router/core/graph_server.go(1 hunks)router/core/router.go(1 hunks)router/pkg/config/config.go(1 hunks)router/pkg/config/config.schema.json(1 hunks)router/pkg/config/testdata/config_defaults.json(1 hunks)router/pkg/config/testdata/config_full.json(1 hunks)router/pkg/metric/config.go(3 hunks)router/pkg/metric/metric_store.go(0 hunks)router/pkg/metric/prometheus_server.go(3 hunks)
💤 Files with no reviewable changes (1)
- router/pkg/metric/metric_store.go
🧰 Additional context used
🧠 Learnings (1)
router/pkg/config/config.schema.json (1)
undefined
<retrieved_learning>
Learnt from: SkArchon
PR: #1929
File: router/internal/circuit/manager.go:16-25
Timestamp: 2025-06-30T20:39:02.376Z
Learning: In the Cosmo router project, parameter validation for circuit breaker configuration is handled at the JSON schema level rather than through runtime validation methods on structs. The config.schema.json file contains comprehensive validation constraints for circuit breaker parameters.
</retrieved_learning>
🧬 Code Graph Analysis (1)
router/core/graph_server.go (1)
router/pkg/metric/metric_store.go (1)
WithCardinalityLimit(435-439)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: build-router
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: integration_test (./events)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: build_push_image
- GitHub Check: image_scan
- GitHub Check: build_test
🔇 Additional comments (13)
router/pkg/config/testdata/config_defaults.json (1)
62-64: 👍 Addition looks correct and JSON remains syntactically validThe new
CardinalityLimitproperty is placed at the correct hierarchy (as a sibling ofPrometheus) and preserves valid JSON formatting (comma after the closing brace ofPrometheus, no trailing comma after the last property).
No further concerns here.router/pkg/config/testdata/config_full.json (1)
92-94: Consistent update with default configSame observation as for the defaults file – the property is correctly located and keeps the JSON valid.
Thanks for keeping the full-blown example in sync.router/core/graph_server.go (1)
746-746: LGTM! Good refactoring to make cardinality limit configurable.This change successfully replaces the hardcoded default cardinality limit with a configurable value from the metric configuration, improving system flexibility and following configuration-driven design principles.
router/core/router.go (1)
2178-2178: LGTM - Configuration propagation implemented correctly.The addition of
CardinalityLimit: cfg.Metrics.CardinalityLimitproperly propagates the configurable cardinality limit from the telemetry configuration to the metric configuration, following the established pattern for other configuration fields.router/pkg/metric/config.go (3)
17-18: LGTM - Well-documented constant definition.The
DefaultCardinalityLimitconstant is properly defined with clear documentation explaining its purpose as a hard limit on metric streams per instrument.
121-122: LGTM - Configuration field properly added.The
CardinalityLimitfield is correctly added to theConfigstruct with appropriate documentation that clearly explains its purpose.
140-140: LGTM - Default initialization follows pattern.The
CardinalityLimitfield is properly initialized with theDefaultCardinalityLimitconstant in theDefaultConfigfunction, consistent with the initialization pattern of other configuration fields.docker-compose.yml (3)
66-66: Verify Prometheus v3.4.2 compatibility.The upgrade from v3.0.0-beta.1 to v3.4.2 is a significant version jump. Please verify that all Prometheus configurations and queries are compatible with this version.
88-88: Verify Grafana v12.0.2 compatibility.The upgrade from Grafana 11.3.1 to 12.0.2 is a major version change that could introduce breaking changes. Please verify that all dashboard configurations and plugins are compatible with Grafana v12.
265-265: LGTM - Healthcheck syntax standardization.The change from double-quoted JSON array style to single-quoted YAML array style for Redis cluster healthcheck commands is a good standardization. The actual command (
redis-cli -p 6379 ping) remains unchanged.Also applies to: 283-283, 301-301
router/pkg/metric/prometheus_server.go (2)
28-44: LGTM - Improved error handling and logging structure.The refactoring to create separate component-specific loggers (
prometheus_handlerandprometheus_server) with proper error handling is an excellent improvement that will enhance debugging capabilities and prevent potential nil pointer issues.
50-50: Verify timeout increase impact.The timeout increase from 10 seconds to 60 seconds is significant and could impact client expectations. While this may be necessary for complex queries, ensure this change aligns with expected use cases and doesn't introduce performance issues.
router/bench-limited-cardinality.js (1)
1-185: LGTM - Well-structured cardinality benchmarking script.This k6 script effectively tests the cardinality limit feature by:
- Using a fixed pool of 300 operation names to control cardinality
- Implementing realistic load testing stages with proper ramp-up
- Using a complex GraphQL query that exercises multiple subgraphs and fragments
- Including proper validation checks for response status and errors
The script design aligns well with testing the configurable cardinality limit (default 2000) introduced in this PR, where 300 operations × ~5 series per metric should stay under the limit.
af515e5 to
d4d10c8
Compare
StarpTech
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
d4d10c8 to
682c92b
Compare
Summary by CodeRabbit
New Features
Improvements
Other Changes
Checklist