Skip to content

Conversation

anivar
Copy link
Contributor

@anivar anivar commented Sep 23, 2025

Fixes #6730

Users were struggling to understand OPA metrics, especially builtin function metrics like http.send(). The existing docs didn't explain what these metrics capture or how they behave with caching.

This PR creates a comprehensive metrics registry that documents all currently discovered OPA metrics. Each metric now has a clear description and units.

Building on PR #7851 (which added the http.send network request counter), this completes the documentation for all http.send metrics:

Beyond fixing the immediate documentation gap, I've added a generator tool in cmd/metrics-docs/ to keep the registry maintainable. Run make generate-metrics-docs to regenerate when new metrics are added to OPA. The generator works from a manually curated list to ensure accurate descriptions.

Also enhanced the existing monitoring and policy-performance docs with operational metrics sections and fixed a broken link in the REST API docs.

Files changed:

  • cmd/metrics-docs/main.go and README.md - Generator tool
  • docs/docs/metrics-registry.md - The complete registry (generated)
  • docs/docs/monitoring.md - Added operational metrics sections
  • docs/docs/policy-performance.md - Enhanced performance metrics
  • docs/docs/rest-api.md - Fixed broken reference

This should help users interpret metrics without diving into source code.

Copy link

netlify bot commented Sep 23, 2025

Deploy Preview for openpolicyagent ready!

Name Link
🔨 Latest commit 1221915
🔍 Latest deploy log https://app.netlify.com/projects/openpolicyagent/deploys/68d640126af3da00088c8d99
😎 Deploy Preview https://deploy-preview-7929--openpolicyagent.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@anivar anivar force-pushed the feat/issue-6730-metrics-documentation branch 3 times, most recently from 82d1b81 to d3352e7 Compare September 23, 2025 04:20
@anivar anivar force-pushed the feat/issue-6730-metrics-documentation branch from a03a2b6 to 928e84b Compare September 23, 2025 19:37
Copy link
Contributor

@charlieegan3 charlieegan3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I have left a few more comments here for you to have a think about. I noticed some metrics like counter_rego_builtin_regex_interquery_value_cache_hits seem to be missing.

While I appreciate the effort, it might be best to work on a smaller change set here. Perhaps we could look to focus on regex and http.send metrics since they are likely some of the more common ones used? Wdyt?

anivar added a commit to anivar/opa that referenced this pull request Sep 25, 2025
- Fix misleading 'aggregated' terminology - use 'instance-level' instead
- Remove per-query metrics section from monitoring.md, add cross-references
- Focus metrics documentation on commonly used regex and http.send built-ins
- Add missing counter_rego_builtin_regex_interquery_value_cache_hits metric
- Move admonition to after example in REST API documentation
- Simplify and reduce scope of metrics documentation per reviewer guidance
anivar added a commit to anivar/opa that referenced this pull request Sep 25, 2025
Per reviewer feedback, removing blank line changes that were
unintentionally included from merging PR open-policy-agent#7929.
anivar added a commit to anivar/opa that referenced this pull request Sep 25, 2025
- Fix misleading 'aggregated' terminology - use 'instance-level' instead
- Remove per-query metrics section from monitoring.md, add cross-references
- Focus metrics documentation on commonly used regex and http.send built-ins
- Add missing counter_rego_builtin_regex_interquery_value_cache_hits metric
- Move admonition to after example in REST API documentation
- Simplify and reduce scope of metrics documentation per reviewer guidance

Signed-off-by: Anivar A Aravind <ping@anivar.net>
anivar added a commit to anivar/opa that referenced this pull request Sep 25, 2025
Per reviewer feedback, removing blank line changes that were
unintentionally included from merging PR open-policy-agent#7929.

Signed-off-by: Anivar A Aravind <ping@anivar.net>
@anivar anivar force-pushed the feat/issue-6730-metrics-documentation branch from 7304b49 to dc63b7f Compare September 25, 2025 13:40
anivar added a commit to anivar/opa that referenced this pull request Sep 26, 2025
- Move metrics overview into Prometheus section for better flow
- Add explicit /metrics path mention in Prometheus intro
- Add links to Status API and Decision Logs documentation
- Fix CLI tools to include proper documentation links
- Clarify which metrics are enabled with instrument=true parameter
- Remove inaccurate 'subset' terminology for Status API

Addresses review comments from charlieegan3 on September 25, 2025

Signed-off-by: Anivar A Aravind <ping@anivar.net>
Generate and document all OPA metrics in a central registry.
Add operational metrics sections to monitoring docs.

Fixes: open-policy-agent#6730
Signed-off-by: Anivar A Aravind <ping@anivar.net>
Based on PR feedback, this commit:
- Clearly distinguishes between /metrics endpoint (system-wide) and ?metrics=true (per-query)
- Removes duplicate metrics listings to avoid maintenance burden
- Adds cross-references between monitoring and REST API docs
- Simplifies the documentation structure without automation

Addresses review feedback from @charlieegan3

Signed-off-by: Anivar A Aravind <ping@anivar.net>
As requested by @charlieegan3:
- Remove auto-generation tooling (cmd/metrics-docs/main.go)
- Remove auto-generated metrics registry file
- Remove Makefile target for metrics generation

The reviewer indicated metrics don't change frequently enough
to warrant automation, and prefers avoiding duplicate lists.

Signed-off-by: Anivar A Aravind <ping@anivar.net>
- Fix misleading 'aggregated' terminology - use 'instance-level' instead
- Remove per-query metrics section from monitoring.md, add cross-references
- Focus metrics documentation on commonly used regex and http.send built-ins
- Add missing counter_rego_builtin_regex_interquery_value_cache_hits metric
- Move admonition to after example in REST API documentation
- Simplify and reduce scope of metrics documentation per reviewer guidance

Signed-off-by: Anivar A Aravind <ping@anivar.net>
Per reviewer feedback, removing blank line changes that were
unintentionally included from merging PR open-policy-agent#7929.

Signed-off-by: Anivar A Aravind <ping@anivar.net>
- Move metrics overview into Prometheus section for better flow
- Add explicit /metrics path mention in Prometheus intro
- Add links to Status API and Decision Logs documentation
- Fix CLI tools to include proper documentation links
- Clarify which metrics are enabled with instrument=true parameter
- Remove inaccurate 'subset' terminology for Status API

Addresses review comments from charlieegan3 on September 25, 2025

Signed-off-by: Anivar A Aravind <ping@anivar.net>
@anivar anivar force-pushed the feat/issue-6730-metrics-documentation branch from 15b9ec8 to 1221915 Compare September 26, 2025 07:26
Copy link
Contributor

@charlieegan3 charlieegan3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @anivar, I've left two comments about the built-in specific metrics I think that would be good to document and where I think they are best documented. I am not sure we're really going in the right direction with the rest of the PR, so if it's ok with you, I think can keep these notes on the suggested pages and get this in. We don't need to update monitoring and policy-performance this time around, for now let's just focus on the metrics for specific built-ins you've documented in here.


### Common Built-in Function Metrics

#### HTTP Built-ins
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can move these http.send metrics to under the table on the https://www.openpolicyagent.org/docs/policy-reference/builtins/http page with a note about where they appear (per query metrics).


High cache hit ratios indicate effective caching and reduced network overhead.

#### Regex Built-ins
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can move these regex metrics to under the table on the https://www.openpolicyagent.org/docs/policy-reference/builtins/regex page with a note about where they appear (per query metrics).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Documentation] No documentation for metric definitions
2 participants