Open
Description
There are several tickets relates to grafana and prometheus that seems relevant to overview, so this is a meta-issue overviewing those.
Grafana
- grafana, utoronto: deployer script unauthorized #2152
- pangeo-hubs, prometheus: server is crashing due to memory limits #2215
- grafana: ensure function of Node CPU and Memory utilization % #2191
- grafana: ensure functionality of OOMKiller dashboard #2213
- grafana: tighten grafana charts k8s RBAC permissions (ClusterRole -> Role) #2182
- grafana: learn if github auth supports either org or team authorization #2179
- Community facing information about grafana monitoring - access and monitoring guidance features#25
- Monitoring EKS/GKE spot instance pre-emption events #2369
- Monitor NFS servers - critical diagnostics to understand issues #2242
- Monitor node performance - network read/write speeds, ephemeral storage read/write speeds and capacity #2243
- Decide on a path for enabling GitHub auth to grafana for Community Reps of hubs on dedicated clusters #1850
- Consider exposing grafana via
https://<hub domain>/services/grafana
#535 - Create a few Grafana dashboards that we can use for reporting #1282
- Collect prometheus metrics from each notebook pod #53
- Tooling for engineers/admins/users to understand memory use #2107
- Example on requesting users authorize the github application for github auth in grafana: [Request deployment] New Hub: Smithsonian #2323 (comment), and another one from the jupyterhub github auth docs in https://infrastructure.2i2c.org/hub-deployment-guide/configure-auth/github-orgs/#granting-access-to-the-oauth-app
jupyterhub/grafana-dashboards
- Is it in scope for this project to document use for non-admin users? jupyterhub/grafana-dashboards#66
- Error when using last 7 days -
RangeError: Invalid array length
jupyterhub/grafana-dashboards#62 - Dashboard panel for node's available ephemeral storage space jupyterhub/grafana-dashboards#64
- Dashboard panel for pod evictions (out of memory, out of ephemeral space, manual node drains) jupyterhub/grafana-dashboards#65
Prometheus
- Ideas to upper bound prometheus-server's memory consumption #2222
- basehub: reduce prometheus-server's configured disk size (currently 100G) #2223
- Document how to programmatically access Grafana / Prometheus data #1785
- Tighten access to prometheus servers #1101
- 2i2c datasource showing up twice in central grafana #1510
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment