Skip to content

Conversation

@endigma
Copy link
Member

@endigma endigma commented Jul 3, 2025

Summary by CodeRabbit

  • New Features

    • Added support for configuring a cardinality limit for metrics collection, with a new default value of 2000 and schema/configuration updates to support this.
    • Introduced a new benchmarking script to simulate GraphQL queries with controlled operation name cardinality for load testing.
  • Improvements

    • Upgraded Prometheus and Grafana service images to newer versions.
    • Enabled anonymous admin access in Grafana for easier dashboard viewing.
    • Increased Prometheus handler timeout from 10 to 60 seconds.
    • Improved error logging for Prometheus components.
  • Other Changes

    • Minor adjustments to Docker Compose healthcheck syntax for Redis cluster nodes.

Checklist

  • I have discussed my proposed changes in an issue and have received approval to proceed.
  • I have followed the coding standards of the project.
  • Tests or benchmarks have been added or updated.
  • Documentation has been updated on https://github.com/wundergraph/cosmo-docs.
  • I have read the Contributors Guide.

@endigma endigma requested review from a team, Noroth, StarpTech and devsergiy as code owners July 3, 2025 10:23
@coderabbitai
Copy link

coderabbitai bot commented Jul 3, 2025

Walkthrough

The changes introduce a configurable cardinality limit for metrics collection by adding a new CardinalityLimit field to relevant configuration structs, schemas, and default values. The metric store and server logic are updated to use this configurable limit. Benchmarking and Docker Compose files are updated, including new k6 scripts and service image/version changes.

Changes

Files/Groups Change Summary
docker-compose.yml Updated Prometheus and Grafana images; removed Grafana container_name; added Grafana anonymous access env vars; adjusted Redis healthcheck syntax format.
router/bench-limited-cardinality.js Added k6 benchmarking script simulating GraphQL queries with limited operation name cardinality; exports load test options, setup, and default test function.
router/core/graph_server.go Metric store creation now uses configurable cardinality limit from server metric config instead of hardcoded default.
router/core/router.go Added CardinalityLimit field to returned metric config struct from telemetry config.
router/pkg/config/config.go,
router/pkg/metric/config.go
Added CardinalityLimit int field to metrics config structs; set default to 2000; updated default config initialization.
router/pkg/config/config.schema.json Added experiment_cardinality_limit integer property (min 1, default 2000) to metrics config schema with descriptive text.
router/pkg/config/testdata/config_defaults.json,
router/pkg/config/testdata/config_full.json
Added "CardinalityLimit": 2000 to Prometheus metrics config in test data JSON files.
router/pkg/metric/metric_store.go Removed unexported constant DefaultCardinalityLimit (value 2000).
router/pkg/metric/prometheus_server.go Refactored Prometheus error logger creation for handler and server with scoped error-level loggers; increased Prometheus handler timeout from 10s to 60s.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4d10c8 and 682c92b.

📒 Files selected for processing (11)
  • docker-compose.yml (6 hunks)
  • router/bench-limited-cardinality.js (1 hunks)
  • router/core/graph_server.go (1 hunks)
  • router/core/router.go (1 hunks)
  • router/pkg/config/config.go (1 hunks)
  • router/pkg/config/config.schema.json (1 hunks)
  • router/pkg/config/testdata/config_defaults.json (1 hunks)
  • router/pkg/config/testdata/config_full.json (1 hunks)
  • router/pkg/metric/config.go (3 hunks)
  • router/pkg/metric/metric_store.go (0 hunks)
  • router/pkg/metric/prometheus_server.go (3 hunks)
💤 Files with no reviewable changes (1)
  • router/pkg/metric/metric_store.go
✅ Files skipped from review due to trivial changes (1)
  • router/pkg/config/config.schema.json
🚧 Files skipped from review as they are similar to previous changes (9)
  • router/pkg/config/testdata/config_full.json
  • router/pkg/config/config.go
  • router/pkg/config/testdata/config_defaults.json
  • router/pkg/metric/config.go
  • router/pkg/metric/prometheus_server.go
  • router/core/router.go
  • docker-compose.yml
  • router/bench-limited-cardinality.js
  • router/core/graph_server.go
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: build-router
  • GitHub Check: image_scan
  • GitHub Check: build_push_image
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_test
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the router label Jul 3, 2025
@github-actions
Copy link

github-actions bot commented Jul 3, 2025

Router image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-7fb2c28c088aa7e376ed42f677ea818665c2e1e4

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
router/pkg/config/config.go (1)

125-130: Expose env-var name in struct tag comment

Minor nit: for new environment variables we typically keep a brief comment for discoverability (several other fields have this). Adding one keeps docs consistent.

CardinalityLimit int `yaml:"cardinality_limit" envDefault:"2000" env:"METRICS_CARDINALITY_LIMIT"` // max unique metric labels per scrape cycle
router/pkg/config/config.schema.json (1)

975-980: Add an explicit upper bound and example to prevent accidental OOMs

Great to see the limit exposed in the schema. Given that very high cardinality values can blow up memory usage quickly, consider:

  1. Adding a "maximum" (e.g. 100 000) to make mis-configurations fail fast.
  2. Providing an "examples" array so users understand the expected magnitude.
  3. Double-checking that the default 2000 is wired through all MetricConfigFromTelemetry callers; a quick grep should confirm no hard-coded fall-backs remain.

These small tweaks tighten validation and improve UX.

router/bench-limited-cardinality.js (1)

22-24: Consider making cardinality configurable.

The comment mentions testing both under and over the cardinality limit (300 vs 500 names). Consider making distinctNames configurable via environment variables to easily test different cardinality scenarios without code changes.

-const distinctNames = 300;
+const distinctNames = parseInt(__ENV.DISTINCT_NAMES || '300');
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd9ddb2 and def6104.

📒 Files selected for processing (11)
  • docker-compose.yml (6 hunks)
  • router/bench-limited-cardinality.js (1 hunks)
  • router/core/graph_server.go (1 hunks)
  • router/core/router.go (1 hunks)
  • router/pkg/config/config.go (1 hunks)
  • router/pkg/config/config.schema.json (1 hunks)
  • router/pkg/config/testdata/config_defaults.json (1 hunks)
  • router/pkg/config/testdata/config_full.json (1 hunks)
  • router/pkg/metric/config.go (3 hunks)
  • router/pkg/metric/metric_store.go (0 hunks)
  • router/pkg/metric/prometheus_server.go (3 hunks)
💤 Files with no reviewable changes (1)
  • router/pkg/metric/metric_store.go
🧰 Additional context used
🧠 Learnings (1)
router/pkg/config/config.schema.json (1)

undefined

<retrieved_learning>
Learnt from: SkArchon
PR: #1929
File: router/internal/circuit/manager.go:16-25
Timestamp: 2025-06-30T20:39:02.376Z
Learning: In the Cosmo router project, parameter validation for circuit breaker configuration is handled at the JSON schema level rather than through runtime validation methods on structs. The config.schema.json file contains comprehensive validation constraints for circuit breaker parameters.
</retrieved_learning>

🧬 Code Graph Analysis (1)
router/core/graph_server.go (1)
router/pkg/metric/metric_store.go (1)
  • WithCardinalityLimit (435-439)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: build-router
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_push_image
  • GitHub Check: image_scan
  • GitHub Check: build_test
🔇 Additional comments (13)
router/pkg/config/testdata/config_defaults.json (1)

62-64: 👍 Addition looks correct and JSON remains syntactically valid

The new CardinalityLimit property is placed at the correct hierarchy (as a sibling of Prometheus) and preserves valid JSON formatting (comma after the closing brace of Prometheus, no trailing comma after the last property).
No further concerns here.

router/pkg/config/testdata/config_full.json (1)

92-94: Consistent update with default config

Same observation as for the defaults file – the property is correctly located and keeps the JSON valid.
Thanks for keeping the full-blown example in sync.

router/core/graph_server.go (1)

746-746: LGTM! Good refactoring to make cardinality limit configurable.

This change successfully replaces the hardcoded default cardinality limit with a configurable value from the metric configuration, improving system flexibility and following configuration-driven design principles.

router/core/router.go (1)

2178-2178: LGTM - Configuration propagation implemented correctly.

The addition of CardinalityLimit: cfg.Metrics.CardinalityLimit properly propagates the configurable cardinality limit from the telemetry configuration to the metric configuration, following the established pattern for other configuration fields.

router/pkg/metric/config.go (3)

17-18: LGTM - Well-documented constant definition.

The DefaultCardinalityLimit constant is properly defined with clear documentation explaining its purpose as a hard limit on metric streams per instrument.


121-122: LGTM - Configuration field properly added.

The CardinalityLimit field is correctly added to the Config struct with appropriate documentation that clearly explains its purpose.


140-140: LGTM - Default initialization follows pattern.

The CardinalityLimit field is properly initialized with the DefaultCardinalityLimit constant in the DefaultConfig function, consistent with the initialization pattern of other configuration fields.

docker-compose.yml (3)

66-66: Verify Prometheus v3.4.2 compatibility.

The upgrade from v3.0.0-beta.1 to v3.4.2 is a significant version jump. Please verify that all Prometheus configurations and queries are compatible with this version.


88-88: Verify Grafana v12.0.2 compatibility.

The upgrade from Grafana 11.3.1 to 12.0.2 is a major version change that could introduce breaking changes. Please verify that all dashboard configurations and plugins are compatible with Grafana v12.


265-265: LGTM - Healthcheck syntax standardization.

The change from double-quoted JSON array style to single-quoted YAML array style for Redis cluster healthcheck commands is a good standardization. The actual command (redis-cli -p 6379 ping) remains unchanged.

Also applies to: 283-283, 301-301

router/pkg/metric/prometheus_server.go (2)

28-44: LGTM - Improved error handling and logging structure.

The refactoring to create separate component-specific loggers (prometheus_handler and prometheus_server) with proper error handling is an excellent improvement that will enhance debugging capabilities and prevent potential nil pointer issues.


50-50: Verify timeout increase impact.

The timeout increase from 10 seconds to 60 seconds is significant and could impact client expectations. While this may be necessary for complex queries, ensure this change aligns with expected use cases and doesn't introduce performance issues.

router/bench-limited-cardinality.js (1)

1-185: LGTM - Well-structured cardinality benchmarking script.

This k6 script effectively tests the cardinality limit feature by:

  • Using a fixed pool of 300 operation names to control cardinality
  • Implementing realistic load testing stages with proper ramp-up
  • Using a complex GraphQL query that exercises multiple subgraphs and fragments
  • Including proper validation checks for response status and errors

The script design aligns well with testing the configurable cardinality limit (default 2000) introduced in this PR, where 300 operations × ~5 series per metric should stay under the limit.

Copy link
Contributor

@StarpTech StarpTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants