Skip to content
This repository was archived by the owner on Feb 22, 2022. It is now read-only.

Add Telegraf Chart #269

Merged
merged 12 commits into from
Dec 19, 2016
Merged

Add Telegraf Chart #269

merged 12 commits into from
Dec 19, 2016

Conversation

jackzampolin
Copy link
Contributor

Signed CLA and fulfilled all the requirements. Any feedback is more than welcome!

@k8s-ci-robot
Copy link
Contributor

Can a kubernetes member verify that this patch is reasonable to test? If so, please reply with "@k8s-bot ok to test" on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands will still work. Regular contributors should join the org to skip this step.

If you have questions or suggestions related to this bot's behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 1, 2016
@viglesiasce
Copy link
Contributor

Helm install failed with:

helm install .
Error: render error in "telegraf/templates/NOTES.txt": template: telegraf/templates/NOTES.txt:1:61: executing "telegraf/templates/NOTES.txt" at <.Values.inputs.stats...>: can't evaluate field statsd in type interface {}

I dont think that is defined in the values.yaml


To trail the logs for the Telegraf pod run the following:

- kubectl logs -f --namespace {{ .Release.Namespace }} $(kubectl get pods --namespace {{ .Release.Namespace }} -l app={{ template "fullname" . }} -o jsonpath='{ .items[0].metadata.name }')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command failed for me:

$  kubectl logs -f --namespace default $(kubectl get pods --namespace default -l app=honorary-peahe-telegraf -o jsonpath='{ .items[0].metadata.name }')
error executing jsonpath "{ .items[0].metadata.name }": array index out of bounds: index 0, length 0

error: expected 'logs POD_NAME [CONTAINER_NAME]'.
POD_NAME is a required argument for the logs command
See 'kubectl logs -h' for help and examples.


- kubectl exec -i -t --namespace {{ .Release.Namespace }} $(kubectl get pods --namespace {{ .Release.Namespace }} -l app={{ template "fullname" . }} -o jsonpath='{.items[0].metadata.name}') /bin/sh

To trail the logs for the Telegraf pod run the following:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same typo as other chart

outputs:
influxdb:
urls:
- "http://influxdb-influxdb.tick:8086"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a required value I would leave it unset and not deploy the daemonset or other deployment until it gets set explicitly by the user. Check out the Ghost chart for an example of that pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In here? I'm not seeing that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! Thanks @prydonius

@viglesiasce
Copy link
Contributor

Should single and Daemonset both be enabled by default? When should you use one vs the other? Can the configs be unified?

swap:
system:
kubernetes:
url: "http://$NODE_IP:10255"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This didn't work for me.

Is this looking for the heapster/cadvisor port? Where are you expecting these to be evaluated? You may need to do this swap with an init container as the pod starts up. Example in the Jenkins chart here.

Copy link
Contributor Author

@jackzampolin jackzampolin Dec 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can reference ENV in the telegraf configuration file. That would get evaluated when telegraf starts up. I haven't had an issue with this approach before. Are you seeing:

2016-12-05T20:56:40Z E! ERROR in input [inputs.kubernetes]: Errors encountered: [error making HTTP request to http://$NODE_IP:10255/stats/summary: dial tcp [::1]:10255: getsockopt: connection refused]```

@viglesiasce
Copy link
Contributor

The daemonset is using the pod name as the hostname for the daemonset when it should use the hostname of the node. Otherwise every pod restart causes a new "node" in telegraf/influxdb parlance which makes things confusing. From what I can tell telegraf is monitoring the node not the pod but I could be wrong.

@jackzampolin
Copy link
Contributor Author

jackzampolin commented Dec 5, 2016

Should single and Daemonset both be enabled by default? When should you use one vs the other? Can the configs be unified?

The telegraf daemonset will pull any host level metrics, cpu, mem, disk, docker. Also the kubernetes telegraf plugin works by polling each of the kubelets and needs to be run on each host.

For the services or applications running on the cluster they can be polled by or push data to the telegraf single instance. The configuration for the individual by default only polls an InfluxDB instance (for monitoring the database), uses the prometheus plugin to poll the Kubernetes API server, and exposes a statsd endpoint.

The configs can't really be unified, however we might be able to leave off the daemonset configuration. I would like to leave it so that adding support for other host-level plugins is easy.

The daemonset is using the pod name as the hostname for the daemonset when it should use the hostname of the node. Otherwise every pod restart causes a new "node" in telegraf/influxdb parlance which makes things confusing. From what I can tell telegraf is monitoring the node not the pod but I could be wrong.

I think I fixed this last week.

@jackzampolin
Copy link
Contributor Author

I've gone ahead and made all the requested changes. @viglesiasce @prydonius anything else you need from me here?

@viglesiasce
Copy link
Contributor

@k8s-bot ok to test

@k8s-ci-robot
Copy link
Contributor

Jenkins Charts e2e failed for commit 7194576. Full PR test history.

The magic incantation to run this job again is @k8s-bot e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@viglesiasce
Copy link
Contributor

@jackzampolin this no longer installs on the first pass. You need to leave (at least) the service as installing. Just leave off the deployment and daemonset.

@viglesiasce
Copy link
Contributor

Still seeing the DS coming up with the pod IP name in Chronograf, it should be the node name.

@viglesiasce
Copy link
Contributor

In chronograf im still not seeing anything in the Kubernetes dashboard. How does that get populated?

@jackzampolin
Copy link
Contributor Author

@viglesiasce:

@jackzampolin this no longer installs on the first pass. You need to leave (at least) the service as installing. Just leave off the deployment and daemonset.

Done! Should just deploy the service by default with nothing backing it unless the single.config.outputs.influxdb.urls[0] (for single instance) or daemonset.config.outputs.influxdb.urls[0] (for demonset) is set to an InfluxDB server.

Still seeing the DS coming up with the pod IP name in Chronograf, it should be the node name.

🤦 sorry about that! That is also related to the kubernetes dashboard not showing up. I've pushed the fix.

@viglesiasce
Copy link
Contributor

Can you add a note to the NOTES.txt that indicates that Telegraf is not yet properly deployed and point them at how they can rectify it? Check out the Ghost chart for an example.

@viglesiasce
Copy link
Contributor

This worked for me. In a follow up PR can you add the syntax for the upgrading the infludb values?

Here is what worked for me (since the value is a list):

helm upgrade romping-zebra stable/telegraf --set daemonset.config.outputs.influxdb.urls={http://quelling-quetzal-influxd:8086}

@viglesiasce viglesiasce added code reviewed lgtm Indicates that a PR is ready to be merged. UX reviewed and removed awaiting review labels Dec 19, 2016
@prydonius prydonius merged commit bcbd8cd into helm:master Dec 19, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. code reviewed lgtm Indicates that a PR is ready to be merged. UX reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants