-
Notifications
You must be signed in to change notification settings - Fork 708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(helm): implementation of integrations chart #7356
Conversation
💚 CLA has been signed |
80be83e
to
96720e7
Compare
deploy/eck-integrations/templates/eck-agent/elastic-agent-daemonset-secret.yaml
Outdated
Show resolved
Hide resolved
deploy/eck-integrations/templates/eck-agent/elastic-agent-daemonset-secret.yaml
Outdated
Show resolved
Hide resolved
deploy/eck-integrations/templates/eck-agent/_kubernetes/_kubernetes_metrics_apiserver.tpl
Outdated
Show resolved
Hide resolved
Overall this is looking great. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested and found the following issues on my cluster:
- Many errors from metricbeat in the form of
Error fetching data for metricset kubernetes.system: error doing HTTP request to fetch 'system' Metricset data: error making http request: Get \"https://jophia-01:10250/stats/summary\": lookup jophia-01 on 10.152.183.10:53: no such host
host.name
on logs-elastic_agent.* was being reported as the pod name instead of the node name- No kubectl or kube-state metrics were being ingested
I added these to the pod spec for the daemonset agent and it resolved most of the issues above:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
What remains after that fix:
- No container logs are ingested - it seems that the
/var/log
volume mounts were applied due to a bug with theif
condition. I've included a fix for that in the PR linked below. For some reason though, I'm still not getting any files mounted on/var/log/containers
for a reason I can't yet understand. - System metrics are missing, but I think this just isn't in the helm chart yet.
Here's a PR with some of the fixes: #7363
deploy/eck-integrations/templates/eck-agent/elastic-agent-daemonset.yaml
Outdated
Show resolved
Hide resolved
deploy/eck-integrations/templates/eck-agent/elastic-agent-daemonset.yaml
Outdated
Show resolved
Hide resolved
deploy/eck-integrations/templates/eck-agent/elastic-agent-daemonset-secret.yaml
Outdated
Show resolved
Hide resolved
deploy/eck-integrations/templates/eck-agent/elastic-agent-daemonset-secret.yaml
Outdated
Show resolved
Hide resolved
deploy/eck-integrations/templates/eck-agent/elastic-agent-daemonset.yaml
Outdated
Show resolved
Hide resolved
deploy/eck-integrations/templates/eck-agent/elastic-agent-daemonset-secret.yaml
Outdated
Show resolved
Hide resolved
deploy/eck-integrations/templates/eck-agent/elastic-agent-daemonset-secret.yaml
Outdated
Show resolved
Hide resolved
deploy/eck-integrations/templates/eck-agent/cluster-role-binding.yaml
Outdated
Show resolved
Hide resolved
@pkoutsovasilis really nice start here. For me everything worked right away. Adding some general comments here for further discussion: -CCing https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/values.yaml as an example. Shall we start using this naming convention and schema even from day1? I know is a big change for now but maybe worths the effort in order to start being compliant with Otel
|
|
||
### Installing Kubernetes ECK Integration | ||
|
||
// TODO(panosk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some possible examples to put here:
// TODO(panosk) | |
To enable all Kubernetes monitoring capabilities for metrics and logs: | |
helm install k8s-monitoring elastic/eck-integrations -n elastic-system --create-namespace \ | |
--set elasticsearch.host=https://elasticsearch:9200,elasticsearch.api_key=XXXX \ | |
--set kubernetes.enabled=true | |
To only enable logs: | |
helm install k8s-monitoring elastic/eck-integrations -n elastic-system --create-namespace \ | |
--set elasticsearch.host=https://elasticsearch:9200,elasticsearch.api_key=XXXX \ | |
--set kubernetes.enabled=true,kubernetes.metrics.enabled=false | |
To enable Cloud Defend: | |
helm install k8s-monitoring elastic/eck-integrations -n elastic-system --create-namespace \ | |
--set elasticsearch.host=https://elasticsearch:9200,elasticsearch.api_key=XXXX \ | |
--set cloudDefend.enabled=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these examples need to mention the secret or the elasticearchRef setup required.
e80fcce
to
ef1aa8a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this following the multiple integration example and it worked as expected.
I think it would be nice to have an example of how setup an equivalent set of Fleet managed agents with the separate daemonset and deployment with the initial agent container configuration overridden (e.g. with leader election disabled everywhere). Perhaps this is intended as a follow up, but users regularly ask for this and it would have simplified several recent support cases.
The only trouble I ran into was when experimenting with the es-ref-secret
. If you omit the expected values (for example trying to use the not yet implemented api-key
instead of password
) the operator doesn't create anything and there's no obvious error.
You can figure out what happened by looking at the operator logs, but users coming from the current deployment pattern aren't going to expect this or know to look for it. Is there a way to make these failures more immediately obvious? Or is this something we have to solve with documentation?
For example this is the error message the controller gave me, which was very clear, but you have to know to look at the controller logs to get it.
{"log.level":"error","@timestamp":"2024-04-05T19:23:47.829Z","log.logger":"manager.eck-operator","message":"Reconciler error","service.version":"2.12.1+a56215f7","service.type":"eck","ecs.version":"1.4.0","controller":"agent-es-association-controller","object":{"name":"agent-pernode","namespace":"default"},"namespace":"default","name":"agent-pernode","reconcileID":"5ee3a9cb-cafb-4aaa-af40-d8a2e221711f","error":"username secret key doesn't exist in secret es-ref-secret","errorCauses":[{"error":"username secret key doesn't exist in secret es-ref-secret"}],"error.stack_trace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.2/pkg/internal/controller/controller.go:227"}
kubernetes_leaderelection: | ||
enabled: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works to disable leader election properly via the agent CRD's config section? Does this work to override the initial configurations for Fleet managed agents as well?
If so it is significantly easier than the hoops users are jumping through to do this without ECK today, see https://www.elastic.co/guide/en/fleet/current/advanced-kubernetes-managed-by-fleet.html.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
essentially via configRef which a secret that the chart populates accordingly and then the operator parses it and constructs properly the agent.yml but I think we mean the same here 🙂 I am not fully aware of how something similar could like for fleet-managed agents but definitely we can have a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I hope happens is the same mechanism overrides the initial configuration of the agent to seed it with the providers disabled, and then Fleet takes over from that starting base configuration.
The providers can't be configured from Fleet right now so this is a pain point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, with the Helm chart being able to create both a Deployment and DaemonSet I don't think we need leaderelection at all anymore so it should be disabled everywhere by default (pending a closer review of the configurations to make sure we did this right).
The only reason leader election exists is so you can just deploy a DaemonSet and emulate having a dedicated deployment to simplify the configuration needed (AFAIK).
thanks for trying this out @cmacknz. The case you stumbled upon is coming from the operator itself. You are right in terms of plain k8s objects nothing appears to be created but Agent CRDs (this is what we deploy here) should be there and their status should indicate that something is happening but not getting ready. With that said ofc we will fortify this through documentation and have a look around how we could make such cases more "visible" |
Yes I actually forgot to check the agent CRDs and was just doing |
4daedf8
to
41fbe2c
Compare
41fbe2c
to
ec4db68
Compare
ec4db68
to
b2da0a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not able to review everything as this is a big PR but what I tested looks good and worked 👍 including the e2e test. We have a feature freeze coming up at the end of this month so if we want to ship this in the next ECK version we would need to get this in a mergeable state soon. There are a few open TODOs mostly around the READMEs etc. that we should address
deploy/eck-integrations/examples/kubernetes-hints-autodiscover/README.md
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,493 @@ | |||
## @section elasticsearchRef define outputs of elasticsearch type and connection parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with this syntax, what is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was inspired by bitnami charts, e.g. this. In theory, hjaving such comment metadata should help in autogenerating the documentation for the values.yaml. That said, I am not sure if it adds value for us here
deploy/eck-integrations/examples/nginx-custom-integration/README.md
Outdated
Show resolved
Hide resolved
deploy/eck-integrations/examples/nginx-custom-integration/README.md
Outdated
Show resolved
Hide resolved
deploy/eck-integrations/examples/nginx-custom-integration/README.md
Outdated
Show resolved
Hide resolved
8f2c159
to
80c62c5
Compare
builkite test this -f p=gke,t=TestIntegrationsChartDefaultKubernetes,s=8.13.2 |
80c62c5
to
398af5c
Compare
…ts, and creating new roles and api keys
…and helm chart templates
b887313
to
6d886b2
Compare
sorry for keeping this open for so long, but based on some internal discussions it was decided that this Helm chart will find its home in |
I open this draft PR to capture the footprint of the changes required to satisfy elastic/elastic-agent#3848 and kickoff further discussions for elastic/elastic-agent#3847. So please don't merge this PR as nothing is concrete yet. With that said if any comments more than welcomed.