Skip to content

Conversation

@onelapahead
Copy link
Contributor

@onelapahead onelapahead commented Feb 15, 2025

NOTE: this a breaking change in the APIServerOptions struct. Clients will have to used the renamed MonitoringConfig option rather than MetricsConfig. However, bc its up to the client to have assembled their config sections/subsections and names however they wish, this does not mean they have to rename a section called say metrics to monitoring. Meaning their configuration files can remain backwards compatible.

Often, the metrics server has been enabled with an internal, non-TLS port so that it can be easily monitoring by Prometheus servers.

Meanwhile, servers often need to provide health or liveness/readiness endpoints for probes when running within containerized environments like Kubernetes.

The framework incentivizes putting those health routes on the core API server, but this has two disadvantages:

  1. the health routes have the same security (TLS, basic auth, etc.)
  2. the health routes risk being externally exposed

With 1., we've encountered issues like kubelet probes' httpGet not supporting (m)TLS connections. If one works around that using an exec probe and say curl, we've observed this to have performance implications on kubelet and the underlying worker host.

With 2., the health routes can available on Ingress's lacking path-aware filtering which can lead to information leaks.

Instead, we want to make the metrics monitoring server more multi-purpose by allowing additional routes to be defined on it. This means clients can move their health routes off of their core API servers, and onto the monitoring server - allowing for less external security risk and giving the user control over whether they want different internal security for their container probes.

… Routes

Signed-off-by: hfuss <hayden.fuss@kaleido.io>
ConfMetricsServerEnabled = "enabled"
ConfMetricsServerPath = "/metrics"
ConfMonitoringServerEnabled = "enabled"
ConfMonitoringServerMetricsPath = "metricsPath"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just flat out bug

Signed-off-by: hfuss <hayden.fuss@kaleido.io>
Copy link
Contributor

@peterbroadhurst peterbroadhurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clear justification.

You're correct this will generate some churn in upstream components (FFTM, FireFly core, etc.), but the ability to register additional endpoints for monitoring seems very useful and I agree having metrics forced as the same would be confusing.

@peterbroadhurst peterbroadhurst merged commit df129bf into hyperledger:main Feb 17, 2025
3 checks passed
@peterbroadhurst peterbroadhurst deleted the monitoringserver branch February 17, 2025 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants