-
Notifications
You must be signed in to change notification settings - Fork 145
docs: add prometheus + grafana deployment guide #1019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add prometheus + grafana deployment guide #1019
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
site-src/guides/metrics.md
Outdated
helm install prometheus prometheus-community/prometheus \ | ||
--namespace monitoring --create-namespace \ | ||
-f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/prometheus/values.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after running this step, I'm running the command from the helm output to get prometheus pod:
Get the Prometheus server URL by running these commands in the same shell:
export POD_NAME=$(kubectl get pods --namespace monitoring -l "app.kubernetes.io/name=prometheus,app.kubernetes.io/instance=prometheus" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace monitoring port-forward $POD_NAME 9090
when running the export POD_NAME=...
I get the following error:
error: error executing jsonpath "{.items[0].metadata.name}": Error executing template: array index out of bounds: index 0, length 0. Printing more information for debugging the template:
template was:
{.items[0].metadata.name}
object given to jsonpath engine was:
map[string]interface {}{"apiVersion":"v1", "items":[]interface {}{}, "kind":"List", "metadata":map[string]interface {}{"resourceVersion":""}}
no pod matches the command. maybe something is missing from values.yaml?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is relying on the merge to main, it's currently pointing to the main
branch despite it's not there yet
You can currently test using:
https://raw.githubusercontent.com/eyalpazz/gateway-api-inference-extension/add-prometheus-guide/config/manifests/prometheus/values.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the error after I copied values.yaml locally and used it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mind trying the instructions on a fresh kind cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, i'll try to do it later tonight
@EyalPazz pls rebase. |
d9739ce
to
2dc2f1a
Compare
2dc2f1a
to
c4fd7e6
Compare
7508fbe
to
eb6ab1e
Compare
Rebased, sry for the delay |
@nirrozenbaum did it work for you after we talked in private? |
@EyalPazz sorry for the delayed response! we had some stressed milestone and I wasn't available. |
Hi!! all good |
@EyalPazz are you able to work with release 0.5 branch? seems like main quickstart is broken |
eb6ab1e
to
69bbbfd
Compare
0.5 branch was also kinda sketchy, but i fixed the problem |
@nirrozenbaum the tests are failing because of kubectl-validate that is being run on the |
I think the values.yaml file should be pulled out of the manifests dir since its not a k8s manifest. We can probably put this under its own directory within config, as far as name of the dir, we can do like |
Should be good to go @nirrozenbaum |
kind: ClusterRoleBinding | ||
metadata: | ||
name: inference-gateway-sa-metrics-reader-role-binding | ||
namespace: monitoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClusterRoleBinding is a cluster-scoped resource, so remove the namespace field (to avoid confusion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw in the current guide it has a namespace so i just went along with it, i can open another PR to fix it, in order to not "complicate" this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw in the current guide it has a namespace so i just went along with it, i can open another PR to fix it, in order to not "complicate" this one
@EyalPazz, that would be much appreciated.
namespace: monitoring | ||
roleRef: | ||
apiGroup: rbac.authorization.k8s.io | ||
kind: ClusterRole |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these resources are already covered in https://gateway-api-inference-extension.sigs.k8s.io/guides/metrics-and-observability/#scrape-metrics-pprof-profiles. To avoid confusion and misconfiguration, this file should only contain ServiceAccount and an updated ClusterRoleBinding that includes an additional subjects
entry for the new ServiceAccount:
- kind: ServiceAccount
name: inference-gateway-sa-metrics-reader
namespace: monitoring
Maybe use kubectl
to patch the ClusterRoleBinding with this entry, so this file only contains the ServiceAccount.
```bash | ||
export POD_NAME=$(kubectl get pods --namespace monitoring -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=grafana" -o jsonpath="{.items[0].metadata.name}") | ||
kubectl --namespace monitoring port-forward $POD_NAME 3000 | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify with a single command that port-forwards the grafana deployment:
kubectl -n monitoring port-forward deploy/grafana 3000
Since Grafana is not configured to use the default admin/admin
login info, add a step to get the password from the secret. For example:
kubectl -n monitoring get secret grafana \
-o go-template='{{ index .data "admin-password" | base64decode }}'
Add a note such as "You can now access the Grafana UI from http://127.0.0.1"
|
||
=== "Self-Hosted" | ||
|
||
Create Necessary ServiceAccount and RBAC Resources: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/Necessary/the necessary/ and s/Resources/resources/
or if you update rbac.yaml
to only include the SA:
s/Create Necessary ServiceAccount and RBAC Resources:/Create the necessary ServiceAccount resource:/
and then include the kubectl patch
command to patch the ClusterRoleBinding with a subject for the monitoring ServiceAccount.
@EyalPazz PTAL and my review comments above. Let me know if you have any questions. |
b1e3dd2
to
5d60f5d
Compare
@danehans wdyt? |
/lgtm @EyalPazz, thanks for your patience in getting this PR merged. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danehans, EyalPazz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/fixes #747