feat(helm): implementation of integrations chart #7356

pkoutsovasilis · 2023-11-30T16:13:01Z

I open this draft PR to capture the footprint of the changes required to satisfy elastic/elastic-agent#3848 and kickoff further discussions for elastic/elastic-agent#3847. So please don't merge this PR as nothing is concrete yet. With that said if any comments more than welcomed.

cla-checker-service · 2023-11-30T16:13:07Z

💚 CLA has been signed

deploy/eck-integrations/README.md

deploy/eck-integrations/templates/eck-agent/elastic-agent-daemonset-secret.yaml

deploy/eck-integrations/templates/eck-agent/_kubernetes/_kubernetes_metrics_apiserver.tpl

deploy/eck-integrations/values.yaml

joshdover · 2023-12-04T16:10:48Z

Overall this is looking great.

deploy/eck-integrations/values.yaml

joshdover

Tested and found the following issues on my cluster:

Many errors from metricbeat in the form of Error fetching data for metricset kubernetes.system: error doing HTTP request to fetch 'system' Metricset data: error making http request: Get \"https://jophia-01:10250/stats/summary\": lookup jophia-01 on 10.152.183.10:53: no such host
host.name on logs-elastic_agent.* was being reported as the pod name instead of the node name
No kubectl or kube-state metrics were being ingested

I added these to the pod spec for the daemonset agent and it resolved most of the issues above:

hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet

What remains after that fix:

No container logs are ingested - it seems that the /var/log volume mounts were applied due to a bug with the if condition. I've included a fix for that in the PR linked below. For some reason though, I'm still not getting any files mounted on /var/log/containers for a reason I can't yet understand.
System metrics are missing, but I think this just isn't in the helm chart yet.

Here's a PR with some of the fixes: #7363

deploy/eck-integrations/templates/eck-agent/elastic-agent-daemonset.yaml

deploy/eck-integrations/templates/eck-agent/elastic-agent-daemonset-secret.yaml

deploy/eck-integrations/templates/eck-agent/elastic-agent-daemonset.yaml

deploy/eck-integrations/templates/eck-agent/elastic-agent-daemonset-secret.yaml

deploy/eck-integrations/values.yaml

deploy/eck-integrations/templates/eck-agent/cluster-role-binding.yaml

gizas · 2023-12-05T09:46:44Z

@pkoutsovasilis really nice start here. For me everything worked right away. Adding some general comments here for further discussion:

-CCing https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/values.yaml as an example. Shall we start using this naming convention and schema even from day1? I know is a big change for now but maybe worths the effort in order to start being compliant with Otel

We should add comments to all fields of value.yaml. Not for this POC of course but for later stable releases
Adding the idea of schema from https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/values.schema.json. This can help us in some initial tests, validations and linting
Right now I can not disable kube-state-metrics if I dont want to. Are we thinking of doing so?
There are two fields that users change: a) logs path and b) scraping period of metrics. Are those part of this POC?

joshdover · 2023-12-05T15:55:56Z

deploy/eck-integrations/README.md

+
+### Installing Kubernetes ECK Integration
+
+// TODO(panosk)


Some possible examples to put here:

Suggested change

// TODO(panosk)

To enable all Kubernetes monitoring capabilities for metrics and logs:

helm install k8s-monitoring elastic/eck-integrations -n elastic-system --create-namespace \

--set elasticsearch.host=https://elasticsearch:9200,elasticsearch.api_key=XXXX \

--set kubernetes.enabled=true

To only enable logs:

helm install k8s-monitoring elastic/eck-integrations -n elastic-system --create-namespace \

--set elasticsearch.host=https://elasticsearch:9200,elasticsearch.api_key=XXXX \

--set kubernetes.enabled=true,kubernetes.metrics.enabled=false

To enable Cloud Defend:

helm install k8s-monitoring elastic/eck-integrations -n elastic-system --create-namespace \

--set elasticsearch.host=https://elasticsearch:9200,elasticsearch.api_key=XXXX \

--set cloudDefend.enabled=true

I think these examples need to mention the secret or the elasticearchRef setup required.

deploy/eck-integrations/values.yaml

cmacknz

I tried this following the multiple integration example and it worked as expected.

I think it would be nice to have an example of how setup an equivalent set of Fleet managed agents with the separate daemonset and deployment with the initial agent container configuration overridden (e.g. with leader election disabled everywhere). Perhaps this is intended as a follow up, but users regularly ask for this and it would have simplified several recent support cases.

The only trouble I ran into was when experimenting with the es-ref-secret. If you omit the expected values (for example trying to use the not yet implemented api-key instead of password) the operator doesn't create anything and there's no obvious error.

You can figure out what happened by looking at the operator logs, but users coming from the current deployment pattern aren't going to expect this or know to look for it. Is there a way to make these failures more immediately obvious? Or is this something we have to solve with documentation?

For example this is the error message the controller gave me, which was very clear, but you have to know to look at the controller logs to get it.

{"log.level":"error","@timestamp":"2024-04-05T19:23:47.829Z","log.logger":"manager.eck-operator","message":"Reconciler error","service.version":"2.12.1+a56215f7","service.type":"eck","ecs.version":"1.4.0","controller":"agent-es-association-controller","object":{"name":"agent-pernode","namespace":"default"},"namespace":"default","name":"agent-pernode","reconcileID":"5ee3a9cb-cafb-4aaa-af40-d8a2e221711f","error":"username secret key doesn't exist in secret es-ref-secret","errorCauses":[{"error":"username secret key doesn't exist in secret es-ref-secret"}],"error.stack_trace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:227"}

cmacknz · 2024-04-08T20:14:50Z

deploy/eck-integrations/values.yaml

+        kubernetes_leaderelection:
+          enabled: false


This works to disable leader election properly via the agent CRD's config section? Does this work to override the initial configurations for Fleet managed agents as well?

If so it is significantly easier than the hoops users are jumping through to do this without ECK today, see https://www.elastic.co/guide/en/fleet/current/advanced-kubernetes-managed-by-fleet.html.

essentially via configRef which a secret that the chart populates accordingly and then the operator parses it and constructs properly the agent.yml but I think we mean the same here 🙂 I am not fully aware of how something similar could like for fleet-managed agents but definitely we can have a look

What I hope happens is the same mechanism overrides the initial configuration of the agent to seed it with the providers disabled, and then Fleet takes over from that starting base configuration.

The providers can't be configured from Fleet right now so this is a pain point.

Also, with the Helm chart being able to create both a Deployment and DaemonSet I don't think we need leaderelection at all anymore so it should be disabled everywhere by default (pending a closer review of the configurations to make sure we did this right).

The only reason leader election exists is so you can just deploy a DaemonSet and emulate having a dedicated deployment to simplify the configuration needed (AFAIK).

pkoutsovasilis · 2024-04-09T17:25:40Z

I tried this following the multiple integration example and it worked as expected.

I think it would be nice to have an example of how setup an equivalent set of Fleet managed agents with the separate daemonset and deployment with the initial agent container configuration overridden (e.g. with leader election disabled everywhere). Perhaps this is intended as a follow up, but users regularly ask for this and it would have simplified several recent support cases.

The only trouble I ran into was when experimenting with the es-ref-secret. If you omit the expected values (for example trying to use the not yet implemented api-key instead of password) the operator doesn't create anything and there's no obvious error.

You can figure out what happened by looking at the operator logs, but users coming from the current deployment pattern aren't going to expect this or know to look for it. Is there a way to make these failures more immediately obvious? Or is this something we have to solve with documentation?

For example this is the error message the controller gave me, which was very clear, but you have to know to look at the controller logs to get it.
{"log.level":"error","@timestamp":"2024-04-05T19:23:47.829Z","log.logger":"manager.eck-operator","message":"Reconciler error","service.version":"2.12.1+a56215f7","service.type":"eck","ecs.version":"1.4.0","controller":"agent-es-association-controller","object":{"name":"agent-pernode","namespace":"default"},"namespace":"default","name":"agent-pernode","reconcileID":"5ee3a9cb-cafb-4aaa-af40-d8a2e221711f","error":"username secret key doesn't exist in secret es-ref-secret","errorCauses":[{"error":"username secret key doesn't exist in secret es-ref-secret"}],"error.stack_trace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:227"}

thanks for trying this out @cmacknz. The case you stumbled upon is coming from the operator itself. You are right in terms of plain k8s objects nothing appears to be created but Agent CRDs (this is what we deploy here) should be there and their status should indicate that something is happening but not getting ready. With that said ofc we will fortify this through documentation and have a look around how we could make such cases more "visible"

cmacknz · 2024-04-09T20:32:27Z

Agent CRDs (this is what we deploy here) should be there and their status should indicate that something is happening but not getting ready.

Yes I actually forgot to check the agent CRDs and was just doing kubectl get pods when working since I'm not used to working with ECK by reflex yet.

pebrc

I was not able to review everything as this is a big PR but what I tested looks good and worked 👍 including the e2e test. We have a feature freeze coming up at the end of this month so if we want to ship this in the next ECK version we would need to get this in a mergeable state soon. There are a few open TODOs mostly around the READMEs etc. that we should address

deploy/eck-integrations/examples/kubernetes-default/README.md

deploy/eck-integrations/examples/kubernetes-hints-autodiscover/README.md

pebrc · 2024-04-10T06:44:45Z

deploy/eck-integrations/values.yaml

@@ -0,0 +1,493 @@
+## @section elasticsearchRef define outputs of elasticsearch type and connection parameters


I am not familiar with this syntax, what is it?

this was inspired by bitnami charts, e.g. this. In theory, hjaving such comment metadata should help in autogenerating the documentation for the values.yaml. That said, I am not sure if it adds value for us here

deploy/eck-integrations/values.yaml

deploy/eck-integrations/examples/nginx-custom-integration/README.md

test/e2e/helm/integrations/builder.go

test/e2e/helm/integrations/chart_test.go

thbkrkr · 2024-04-26T07:29:40Z

builkite test this -f p=gke,t=TestIntegrationsChartDefaultKubernetes,s=8.13.2

…ntegration

…ts in chart

…ts, and creating new roles and api keys

…and helm chart templates

pkoutsovasilis · 2024-08-16T16:08:04Z

sorry for keeping this open for so long, but based on some internal discussions it was decided that this Helm chart will find its home in elastic-agent repo. Expect that, ECK operator will still be supported by the "new" helm chart there, but there will be also an option to deploy elastic-agent as plain k8s objects. Thank you for all the help @thbkrkr and @pebrc 🙏

pkoutsovasilis added the discuss We need to figure this out label Nov 30, 2023

botelastic bot added the triage label Nov 30, 2023

thbkrkr added the >feature Adds or discusses adding a feature to the product label Nov 30, 2023

botelastic bot removed the triage label Nov 30, 2023

pkoutsovasilis mentioned this pull request Nov 30, 2023

Introduce a Helm chart for Kubernetes monitoring with Elastic Agent elastic/elastic-agent#3847

Closed

pkoutsovasilis force-pushed the pkoutsovasilis/eck_integrations branch from 80be83e to 96720e7 Compare November 30, 2023 17:10

joshdover reviewed Dec 4, 2023

View reviewed changes

deploy/eck-integrations/README.md Show resolved Hide resolved

joshdover reviewed Dec 4, 2023

View reviewed changes

joshdover reviewed Dec 5, 2023

View reviewed changes

deploy/eck-integrations/values.yaml Outdated Show resolved Hide resolved

joshdover reviewed Dec 5, 2023

View reviewed changes

deploy/eck-integrations/templates/eck-agent/elastic-agent-daemonset.yaml Outdated Show resolved Hide resolved

deploy/eck-integrations/templates/eck-agent/elastic-agent-daemonset.yaml Outdated Show resolved Hide resolved