Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metrics): add run latency to executor metrics #16190

Merged
merged 1 commit into from
Dec 11, 2019

Conversation

AlirieGray
Copy link
Contributor

@AlirieGray AlirieGray commented Dec 10, 2019

This PR adds a histogram metric scheduleInterval to the Executor metrics that records the time lag between when a task is scheduled to be run and when it actually runs. This can be used to determine the "per minute system concurrency" as described in the queryd SLO RFC.

This PR also adds a property RunAt to the Run object, which is used to record the time the Run is scheduled to be executed, including the task's Offset. This is added to the Run object because it is possible that the Run object will be changed by the user in the time between the Run being scheduled and being run, which would make our metrics for that Run inaccurate.

The value ScheduledAt is changed to ScheduledFor to maintain consistency.

  • Rebased/mergeable
  • Tests pass

@AlirieGray AlirieGray force-pushed the tasks/query-interval-metric branch 5 times, most recently from 3a5bb81 to 47d6def Compare December 10, 2019 21:59
@AlirieGray AlirieGray force-pushed the tasks/query-interval-metric branch 2 times, most recently from ea4da8f to 139dc88 Compare December 10, 2019 23:06
Copy link
Contributor

@stuartcarnie stuartcarnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great work, @AlirieGray 👍 The improvements to consistency will really help future peers following along.

I only had a couple of very minor nits and you can merge it.

Comment on lines +87 to +88
ScheduledFor time.Time `json:"scheduledFor"` // ScheduledFor is the Now time used in the task's query
RunAt time.Time `json:"runAt"` // RunAt is the time the task is scheduled to be run, which is ScheduledFor + Offset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 Thank you for adding clarifying documentation so the next person will have an easier time understanding the fields

schLogger.Info(
"error in scheduler run",
zap.String("taskID", platform.ID(taskID).String()),
zap.Time("scheduledAt", scheduledAt),
zap.Time("scheduledFor", scheduledFor),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great to see the consistency with scheduledFor and runAt 🎉

Comment on lines 44 to 46
// time.Time for when the next run is due (includes offset)
RunAt time.Time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary, as you'll be removing the RunCreation type with your cleanup PR

@@ -22,10 +22,10 @@ type TaskControlService interface {
NextDueRun(ctx context.Context, taskID influxdb.ID) (int64, error)

// CreateRun creates a run with a schedule for time.
// This differes from CreateNextRun in that it should not to use some scheduling system to determin when the run
// This differs from CreateNextRun in that it should not to use some scheduling system to determine when the run
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

Namespace: namespace,
Subsystem: subsystem,
Name: "schedule_interval",
Help: "The interval between the time the run was scheduled for and the time the task's next run is due at, by task type",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name and wording of this metric is a little confusing. I would consider calling the field runLatency and the metric run_latency_seconds. Prometheus guidelines suggest adding the units (plural) to the end of the metric name.

The Help could the be something like:

Records the latency between the time a task was due to run and the time the task started execution, by task type

@AlirieGray AlirieGray force-pushed the tasks/query-interval-metric branch 2 times, most recently from 9ee7758 to 2d01100 Compare December 11, 2019 21:06
@AlirieGray AlirieGray force-pushed the tasks/query-interval-metric branch from 2d01100 to 7711940 Compare December 11, 2019 21:10
@AlirieGray AlirieGray changed the title feat(metrics): add task schedule interval to executor metrics feat(metrics): add run latency to executor metrics Dec 11, 2019
Copy link
Contributor

@stuartcarnie stuartcarnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!

@AlirieGray AlirieGray force-pushed the tasks/query-interval-metric branch from 7711940 to 2ec452b Compare December 11, 2019 21:35
@AlirieGray AlirieGray merged commit b5ccad3 into master Dec 11, 2019
@AlirieGray AlirieGray deleted the tasks/query-interval-metric branch December 11, 2019 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants