Skip to content

Conversation

@stevencrake-nscale
Copy link
Collaborator

@stevencrake-nscale stevencrake-nscale commented Sep 22, 2025

This PR introduces observability agent support for Kubernetes clusters, enabling telemetry collection and forwarding to the centralised observability platform.

The approach taken is a temporary™️ one to getting the agent on customer clusters. An ADR was written detailing this decision, here. We will review the approach in future.

Caveat for testing: right now the chart requires a manual application of the aggregator basic auth secret until mTLS is added in the coming week.

Summary

  • Adds new observabilityAgent feature flag to KubernetesClusterSpec
  • Creates observability-agent HelmApplication definition for k8s-deploy-observability-agent chart
  • Adds new ApplicationBundle version 1.4.0 with observability agent included
  • Integrates observability agent provisioning into cluster add-ons workflow
  • Provides conditional deployment (opt-in) based on cluster spec features provided in the request

@stevencrake-nscale stevencrake-nscale changed the title observability wip O11y agent application for clusters Sep 22, 2025
@stevencrake-nscale stevencrake-nscale force-pushed the steven/obs-71 branch 5 times, most recently from 14ada43 to b7c70c5 Compare September 22, 2025 11:32
versions:
- version: 0.1.0
repo: https://github.com/nscaledev/k8s-deploy-observability-agent
branch: main
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be tagged when an RC is ready

@stevencrake-nscale stevencrake-nscale force-pushed the steven/obs-71 branch 4 times, most recently from d46c01e to 1964d54 Compare September 22, 2025 12:22
@stevencrake-nscale stevencrake-nscale changed the title O11y agent application for clusters Add observability agent application for clusters Sep 22, 2025
values := map[string]any{
"clusterName": cluster.Name,
"region": cluster.Labels[RegionNameLabel],
"environment": "nks",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have a clear way to obtain the currently expected values for this required value.

the value expects e.g.

  • glo1
  • sta1

the argument for setting it to nks is that it clearly identifies telemetry coming from an NKS cluster via the observability agent

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we standardise all of our clusters to use the canonical region name such as no-glo1-dev, no-glo1, no-sta1 etc. then we could remove the environment key altogether.

@stevencrake-nscale stevencrake-nscale marked this pull request as ready for review September 22, 2025 12:29
@squaremo
Copy link
Contributor

This PR introduces observability agent support for Kubernetes clusters, enabling telemetry collection and forwarding to the centralised observability platform.

I am on board with the idea of optionally setting up telemetry for my clusters. Say I have a Prometheus running centrally, how do I tell this agent to send metrics to it?

@stevencrake-nscale
Copy link
Collaborator Author

stevencrake-nscale commented Oct 2, 2025

This PR introduces observability agent support for Kubernetes clusters, enabling telemetry collection and forwarding to the centralised observability platform.

I am on board with the idea of optionally setting up telemetry for my clusters. Say I have a Prometheus running centrally, how do I tell this agent to send metrics to it?

@squaremo
Right now the chart takes a single otel endpoint, we've discussed receiving an array, with our one provided by default and allowing users to provide additional

https://github.com/nscaledev/k8s-deploy-observability-agent/blob/e8617da5033768d04dbd6ffac64edf1375744e5d/charts/observability-agent/templates/collector.yaml#L115-L119

@squaremo
Copy link
Contributor

squaremo commented Oct 3, 2025

Right now the chart takes a single otel endpoint, we've discussed receiving an array, with our one provided by default and allowing users to provide additional

I don't think this is a good idea. That would mean the chart, and this component, are specialised to our deployment by default -- which is wrong in principle (to wit: "don't give open things proprietary dependencies"); but also in practice an avoidable pain, for e.g., local deployments. Better to default it to empty and provide the deployment particulars in our own manifests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants