-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metrics] Kubernetes Prometheus metrics address #687
Comments
There is a function for |
Thank you for the quick answer. Indeed, the
At the same time I am using kubectl proxy with Do you have any idea where the problem could be? |
Seems like the address is trying to access the port 1390 (deployment port) although our service port is 9090? |
Going to |
Hmm.. so when port is included in the url, it fails. Could you access Prometheus in this way? Also, @RehanSD @simon-mo @withsmilo, have you guys encountered similar issues? |
No, it shows loading for a few minutes and then that Unfortunately after stopping Clipper completely and rerunning it with:
It gets stuck at:
After a few tries and ~25 seconds it continues with the warning (error more likely) below
I don't know exactly how I fixed that before, but I think that |
Because model deployment is ok, querying is ok and I cannot get to the Prometheus address in the browser with or without proxy I have to suspect that there is something wrong with the Prometheus pod. Both ways get stuck for a long time and try to redirect to This is the pod log: (level=info messages removed), the error starts immediately after the launch
|
Are you using any cloud service with RBAC? Seems like the log is related to RBAC configuration. RBAC has been problems for We don't natively support RBAC config now. #564. This can be a temporary solution. Also, there is one PR that tried to tackle this problem https://github.com/ucbrise/clipper/pull/605/files. Feel free to submit a PR if you resolve it! |
I checked and yes, Kubespray by default launches RBAC with Kubernetes. I will try to create my cluster from scratch without it and I will update the issue here |
@jacekwachowiak
|
Yes, I've seen how the urls are created, but the problem is not there. |
The PR is not verified yet. Based on the fact that he created a PR, I believe it should work though. He was going through a similar problem like you before and asked a question here. #564. @simon-mo said, "All we need is to add RBAC support in our kubernetes config files." You can probably try to create a proper RBAC config files and manually apply them to your cluster using kubectl as well. |
Also, did you get the same pod logs when you created your cluster without RBAC? |
Apparently using Kubespray without RBAC is problematic (looking at how it creates the cluster), I got some pods stuck in pending state, and the same happened to Clipper deployment - it crashed after 5min, as programmed when there was no progress. Since I cannot limit myself to one Kubernetes cluster creation tool (for now Kuberspray on Openstack), I need to consider that on other cluster RBAC might be not optional (e.g. EKS AWS) and right now I think it's more important to check if Clipper can work with or/and without it. I will let you know about everything I find. |
I used the PR and to no avail :(
the error is in the Prometheus pod from the beginning, as it was. |
@simon-mo created a new PR on the top of #564. #694. If this passes If PR doesn't work, there can be two other things you can try. Also, please make sure 1
to prometheus deployment https://github.com/ucbrise/clipper/blob/develop/clipper_admin/clipper_admin/kubernetes/prom_deployment.yaml under 2Other solution can be to bind |
Thank you for the fast and extensive reply, I'll take a look at the PR asap! |
I saw that the PR #693 about the metric address method was merged, I updated the repo and tried it - it returns no error now, but it does not return the address either:
|
Lol I forgot to add return. I will create a PR real soon. Please use cm.get_metric_addr until then. Also, so you could access the metric through proxy now? |
Yes, no problem about that :)
so maybe the step about the service account can fix that. Going to the address in the browser ends still in loading forever |
Can you try this Also, @simon-mo You might need to see this for debugging. |
Can you give me some more details how to run this? I get this and I cannot find a good example how to use
Does this help?
|
To use I found how to access Here is the important part. To configure kube-apiserver command line arguments you need to modify /etc/kubernetes/manifests/kube-apiserver.yaml on your master. |
so it seems it is not |
Okay. Seems like Can you try to add ServiceAccount in the yaml file? I think you can add If it doesn't work, you can find the |
On it, I'll update soon |
BTW, this is the source about we need ServiceAccount plugin |
Do I have to restart the pod/container to be sure that the change was applied? |
It says kublet periodically scans all the manifest files to update. So I believe it should be fine? I am not 100% sure about this. https://stackoverflow.com/questions/50007654/how-does-kube-apiserver-restart-after-editing-etc-kubernetes-manifests-kube-api |
@jacekwachowiak How's the result? Are you able to access prometheus now? |
Sorry for the delay but the Murphy's law got involved - the cloud I am using has currently network problems and I have to wait until it's fixed! It stopped working the moment I changed the manifest, but it's surely not the cause 😅 |
Haha. gotcha! Let me know if it will resolve the issue! |
I'm back but without good news, nothing changed. After adding |
Sorry to hear that. Let's try 3 different things, and if it is not solved, we will work on it shortly. You should probably wait until it is resolved. 1In So it should look like
2I think it is a bad solution. Don't try this.
3Restart clipper with different service type. Refer this PR. #667. Set metric's serviceType to LoadBalancer. You can see the external IP address using (You can also just change the I am pretty sure this will work. In this case, I guess you don't need to use proxy to access the service (because loadlbalancer will expose the external IP). |
For step 1: I changed both of |
Hmm, so there's no more error logs, but you still cannot see the metric from the browser? @withsmilo How did you guys resolve RBAC problem for metrics? |
So sorry we are using the default Prometheus in in-house Kubernetes which is managed by another team, not Clipper's. |
Regarding 3, I have restarted Clipper with the additional argument:
I have reverted 1 but it only changed one thing - the log error returned (I tried 3 with both reverted and not reverted 1). I will try to set it manually to |
I guess it is the problem of the code itself. Yeah can you try setting load balancer manually? If it doesn’t work I will try to resolve it shortly |
And also:
I have put the
It seems none of the places work. I am still getting |
I think you should change the prom_service not deployment (not sure if you already did). |
Also, it seems like there is a load balancer service. Is external IP still pending? |
Oh, yes, I was wondering if there is another file for the service, I'm trying it right now! |
Makes sense. If so, using RBAC with nodeport will be the only option. We will debug the RBAC PR real soon and let you know once it is succesfully tested and merged! Sorry for the inconvenience! |
Thanks! I'll see what I can do anyway and follow all the changes in the repo :) |
Hi, @jacekwachowiak! @simon-mo will handle this issue shortly and this fix is going to be included in the next release coming soon! |
Hi, @jacekwachowiak I had a new update on #694, and it seems like it can read metrics within the test. Could you merge that PR locally and see if it works on your side? |
Ok, thank you, I'll take a look and let you know when I get something |
@jacekwachowiak Would you mind create a new issue since it is a different problem from the title? (So that other people can see it later.) Also, I will look into it as soon as possible. |
Ok, on it |
Resolved with #694 |
Following the documentation on http://clipper.ai/tutorials/metrics/ I should be able to find the Prometheus address to see some metrics of the Clipper deployments, but the suggested method
clipper_conn.get_metric_addr()
seems not to exist (anymore?). It is also not listed here http://docs.clipper.ai/en/v0.3.0/clipper_connection.html.My Kubernetes cluster looks ok, it has a pod for metrics so the question is : How can I get the address to access Prometheus easily? I tried to reuse some parts of the output of
get_query_addr()
but without success.The text was updated successfully, but these errors were encountered: