-
-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine performance metrics meta-issue #7665
Comments
Current data schema#7701 (currently in main git tip; will be in release 2023.4.1) introduces a new mapping These are ever-increasing key->float amount pairs. The keys are as follows. Note that there may be (there are) more keys than the ones listed below, and while all keys listed below are tuples, some keys may be bare strings. Worker.execute() metricsFormat
All metrics with the same unit are additive. Individual labelsDeserialize run_spec
Unspill inputs
Run task in thread
Spill output (will disappear after #4424)
Delta to end-to-end runtime as seen from the worker state machine
Future additions
Failed tasksTime wasted on non-successful tasks.
Worker.gather_dep metricsFormat
All metrics with the same unit are additive. Individual labelsWorker.gather_dep() method
Spill output (will disappear after #4424)
Delta to end-to-end runtime as seen from the worker state machine
Future additions
Failed transfersTime wasted on non-successful transfers.
Worker.get_data() metricsFormat
All metrics with the same unit are additive. Individual labelsUnspill
Send over the network
|
Summary from an offline meeting with @fjetter, @hendrikmakait and @ntabris :
|
XREFs
Worker.gather_dep
#7217In #7586, we started collecting very granular metrics on how workers are spending their time.
Demo: https://gist.github.com/crusaderky/a97f870c51260e63a1c14c20b762f666
As of that PR, we collect metrics in
Worker.digests_total
about:Worker.execute
, broken down by task prefix and activity, with special treatment for failed and cancelled tasksWorker.gather_dep
, broken down by activity, with special treatment for failed and cancelled transfersWorker.get_data
, broken down by activityWorkerMemoryMonitor._spill
, broken down by activityThis issue is a meta-tracker of all potential follow-ups, as well as a place to discuss high level design and cost/benefit ratios holistically.
The follow-ups can be broken down into two high level threads:
Improve quality and usability of collected data
What we do with the data
Finishing touches
The text was updated successfully, but these errors were encountered: