-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[inputs.prometheus] SIGSEGV on startup with Kubernetes 1.20 #10085
Comments
Hi, That trace is from this bit of code: func (p *Prometheus) watchPod(ctx context.Context, client *kubernetes.Clientset) error {
watcher, err := client.CoreV1().Pods(p.PodNamespace).Watch(ctx, metav1.ListOptions{
LabelSelector: p.KubernetesLabelSelector,
FieldSelector: p.KubernetesFieldSelector,
})
defer watcher.Stop()
if err != nil {
return err
} The nil pointer happens when trying to |
@powersj -- yep, not in a position to build the code, but should be able to modify the existing Docker image with a new executable for use in our K8s cluster. |
Alright, #10091 has artifacts now attached to it. Can you please give those a shot? Thanks! |
New output is
|
Tried to work around this by using the node level scrape, but I can't get that working either. I don't see any errors in the log, but it also isn't scraping from any pods (turning on debug tracing does not emit the "will scrape metrics" message from |
This is likely caused by permission issue of your telegraf pod. The error message is not very helpful.
|
We had assumed it was probably something along those lines, but we have yet to find the incantation which allows it to work. Currently we have the following definition for the SA, Role, RoleBinding. Note that this configuration works fine on our K8s 1.18 based clusters.
On our 1.20 based EKS cluster the example you give above fails to apply with:
Removing that key from the It does seem to be permission related, but we've yet to find the grants that will make it happy (at least starting with K8s 1.20). |
sorry, the apiGroup line in kind service account in my example above needs to be removed.
(ClusterRole/Binding instead of Role/Binding is needed in case of cluster level pods watch) |
in your config, the rules is empty (null), have you tried adding the same rules as those defined in my config above? |
Yes, sorry I was unclear about that -- I replaced our definitions with the ones you originally provided and got the same error. However, we haven't tried the ones you just provided in your most recent update. But once we do we'll let you know the result. Thanks! |
Confirmed that the configuration above works. Looks like K8s actually started applying some security to the watch API (which is good). Thanks for the help in finding the correct permissions to use. |
Address documentation gap
cool! thanks for verifying. |
Relevent telegraf.conf
System info
Telegraf 1.20.3, Kubernetes 1.20.7
Docker
No response
Steps to reproduce
Expected behavior
Telegraf should start and the prometheus input should start scraping the pods it finds via discovery. This works just fine in Kubernetes 1.19, but fails as describe above in K8s 1.20.
Actual behavior
Telegraf pod dies immediately with the following error:
Additional info
No response
The text was updated successfully, but these errors were encountered: