Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine performance metrics: Prometheus and Grafana #7680

Closed
crusaderky opened this issue Mar 17, 2023 · 5 comments
Closed

Fine performance metrics: Prometheus and Grafana #7680

crusaderky opened this issue Mar 17, 2023 · 5 comments

Comments

@crusaderky
Copy link
Collaborator

Export the metrics collected on the scheduler by #7666 to Prometheus.
For the sake of keeping everything additive, execute, gather-dep, get-data, and async spilling metrics should be kept separate.

Coiled-specific deliverable: implement new Grafana plots that use the above metrics.

@fjetter
Copy link
Member

fjetter commented Mar 31, 2023

The current schema of the data we are storing would be helpful for this issue ahead of time. Essentially what we are storing in #7666

We are already documenting something like this for existing metrics, see https://distributed.dask.org/en/latest/prometheus.html

What is missing on the documentation page is information about

  • What labels are exported by metric
  • What data type (count, duration, ...)

@fjetter
Copy link
Member

fjetter commented Mar 31, 2023

Right now, it's not entirely certain if we want to expose this as a timeseries so there is also room to simply attach this information to a Computation object

@fjetter
Copy link
Member

fjetter commented Apr 6, 2023

@hendrikmakait @crusaderky I believe there was a meeting discussing this topic. Can either of you summarize the outcome of it here?

Ok, I just realize there is a summary here #7665 (comment)

Does this mean we will not put those metrics in prometheus for now? If so, I suggest to close this issue

@hendrikmakait
Copy link
Member

I'd be fine with closing this ticket as not planned for now and reopening it at a later point in time. Providing E2E aggregates should be the priority for now.

@crusaderky
Copy link
Collaborator Author

Agreed

@crusaderky crusaderky reopened this Apr 6, 2023
@crusaderky crusaderky closed this as not planned Won't fix, can't repro, duplicate, stale Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants