-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(metrics): add run latency to executor metrics #16190
Conversation
3a5bb81
to
47d6def
Compare
ea4da8f
to
139dc88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great work, @AlirieGray 👍 The improvements to consistency will really help future peers following along.
I only had a couple of very minor nits and you can merge it.
ScheduledFor time.Time `json:"scheduledFor"` // ScheduledFor is the Now time used in the task's query | ||
RunAt time.Time `json:"runAt"` // RunAt is the time the task is scheduled to be run, which is ScheduledFor + Offset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 Thank you for adding clarifying documentation so the next person will have an easier time understanding the fields
schLogger.Info( | ||
"error in scheduler run", | ||
zap.String("taskID", platform.ID(taskID).String()), | ||
zap.Time("scheduledAt", scheduledAt), | ||
zap.Time("scheduledFor", scheduledFor), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great to see the consistency with scheduledFor
and runAt
🎉
task/backend/scheduler.go
Outdated
// time.Time for when the next run is due (includes offset) | ||
RunAt time.Time | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary, as you'll be removing the RunCreation
type with your cleanup PR
@@ -22,10 +22,10 @@ type TaskControlService interface { | |||
NextDueRun(ctx context.Context, taskID influxdb.ID) (int64, error) | |||
|
|||
// CreateRun creates a run with a schedule for time. | |||
// This differes from CreateNextRun in that it should not to use some scheduling system to determin when the run | |||
// This differs from CreateNextRun in that it should not to use some scheduling system to determine when the run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
Namespace: namespace, | ||
Subsystem: subsystem, | ||
Name: "schedule_interval", | ||
Help: "The interval between the time the run was scheduled for and the time the task's next run is due at, by task type", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name and wording of this metric is a little confusing. I would consider calling the field runLatency
and the metric run_latency_seconds
. Prometheus guidelines suggest adding the units (plural) to the end of the metric name.
The Help
could the be something like:
Records the latency between the time a task was due to run and the time the task started execution, by task type
9ee7758
to
2d01100
Compare
2d01100
to
7711940
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work!
7711940
to
2ec452b
Compare
This PR adds a histogram metric
scheduleInterval
to the Executor metrics that records the time lag between when a task is scheduled to be run and when it actually runs. This can be used to determine the "per minute system concurrency" as described in the queryd SLO RFC.This PR also adds a property
RunAt
to the Run object, which is used to record the time the Run is scheduled to be executed, including the task's Offset. This is added to the Run object because it is possible that the Run object will be changed by the user in the time between the Run being scheduled and being run, which would make our metrics for that Run inaccurate.The value
ScheduledAt
is changed toScheduledFor
to maintain consistency.