feat(metrics): add run latency to executor metrics #16190

AlirieGray · 2019-12-10T21:33:47Z

This PR adds a histogram metric scheduleInterval to the Executor metrics that records the time lag between when a task is scheduled to be run and when it actually runs. This can be used to determine the "per minute system concurrency" as described in the queryd SLO RFC.

This PR also adds a property RunAt to the Run object, which is used to record the time the Run is scheduled to be executed, including the task's Offset. This is added to the Run object because it is possible that the Run object will be changed by the user in the time between the Run being scheduled and being run, which would make our metrics for that Run inaccurate.

The value ScheduledAt is changed to ScheduledFor to maintain consistency.

Rebased/mergeable
Tests pass

stuartcarnie

Really great work, @AlirieGray 👍 The improvements to consistency will really help future peers following along.

I only had a couple of very minor nits and you can merge it.

stuartcarnie · 2019-12-10T22:26:09Z

task.go

+	ScheduledFor time.Time `json:"scheduledFor"`          // ScheduledFor is the Now time used in the task's query
+	RunAt        time.Time `json:"runAt"`                 // RunAt is the time the task is scheduled to be run, which is ScheduledFor + Offset


💯 Thank you for adding clarifying documentation so the next person will have an easier time understanding the fields

stuartcarnie · 2019-12-11T20:30:21Z

cmd/influxd/launcher/launcher.go

 					schLogger.Info(
 						"error in scheduler run",
 						zap.String("taskID", platform.ID(taskID).String()),
-						zap.Time("scheduledAt", scheduledAt),
+						zap.Time("scheduledFor", scheduledFor),


Really great to see the consistency with scheduledFor and runAt 🎉

stuartcarnie · 2019-12-11T20:35:10Z

task/backend/scheduler.go

+	// time.Time for when the next run is due (includes offset)
+	RunAt time.Time
+


This is not necessary, as you'll be removing the RunCreation type with your cleanup PR

stuartcarnie · 2019-12-11T20:36:21Z

task/backend/task.go

@@ -22,10 +22,10 @@ type TaskControlService interface {
 	NextDueRun(ctx context.Context, taskID influxdb.ID) (int64, error)

 	// CreateRun creates a run with a schedule for time.
-	// This differes from CreateNextRun in that it should not to use some scheduling system to determin when the run
+	// This differs from CreateNextRun in that it should not to use some scheduling system to determine when the run


stuartcarnie · 2019-12-11T20:45:46Z

task/backend/executor/executor_metrics.go

+			Namespace: namespace,
+			Subsystem: subsystem,
+			Name:      "schedule_interval",
+			Help:      "The interval between the time the run was scheduled for and the time the task's next run is due at, by task type",


The name and wording of this metric is a little confusing. I would consider calling the field runLatency and the metric run_latency_seconds. Prometheus guidelines suggest adding the units (plural) to the end of the metric name.

The Help could the be something like:

Records the latency between the time a task was due to run and the time the task started execution, by task type

stuartcarnie

Awesome work!

AlirieGray force-pushed the tasks/query-interval-metric branch 5 times, most recently from 3a5bb81 to 47d6def Compare December 10, 2019 21:59

AlirieGray requested a review from stuartcarnie December 10, 2019 22:00

AlirieGray force-pushed the tasks/query-interval-metric branch 2 times, most recently from ea4da8f to 139dc88 Compare December 10, 2019 23:06

stuartcarnie requested changes Dec 11, 2019

View reviewed changes

AlirieGray force-pushed the tasks/query-interval-metric branch 2 times, most recently from 9ee7758 to 2d01100 Compare December 11, 2019 21:06

AlirieGray requested a review from stuartcarnie December 11, 2019 21:10

AlirieGray force-pushed the tasks/query-interval-metric branch from 2d01100 to 7711940 Compare December 11, 2019 21:10

AlirieGray changed the title ~~feat(metrics): add task schedule interval to executor metrics~~ feat(metrics): add run latency to executor metrics Dec 11, 2019

stuartcarnie approved these changes Dec 11, 2019

View reviewed changes

feat(metrics): add task schedule interval to executor metrics

2ec452b

AlirieGray force-pushed the tasks/query-interval-metric branch from 7711940 to 2ec452b Compare December 11, 2019 21:35

AlirieGray merged commit b5ccad3 into master Dec 11, 2019

AlirieGray deleted the tasks/query-interval-metric branch December 11, 2019 22:50

alexpaxton pushed a commit that referenced this pull request Jan 9, 2020

feat(metrics): add run latency to executor metrics (#16190)

b725bc8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(metrics): add run latency to executor metrics #16190

feat(metrics): add run latency to executor metrics #16190

AlirieGray commented Dec 10, 2019 •

edited

Loading

stuartcarnie left a comment

stuartcarnie Dec 10, 2019

stuartcarnie Dec 11, 2019

stuartcarnie Dec 11, 2019

stuartcarnie Dec 11, 2019

stuartcarnie Dec 11, 2019

stuartcarnie left a comment

		ScheduledFor time.Time `json:"scheduledFor"` // ScheduledFor is the Now time used in the task's query
		RunAt time.Time `json:"runAt"` // RunAt is the time the task is scheduled to be run, which is ScheduledFor + Offset

		// time.Time for when the next run is due (includes offset)
		RunAt time.Time

feat(metrics): add run latency to executor metrics #16190

feat(metrics): add run latency to executor metrics #16190

Conversation

AlirieGray commented Dec 10, 2019 • edited Loading

stuartcarnie left a comment

Choose a reason for hiding this comment

stuartcarnie Dec 10, 2019

Choose a reason for hiding this comment

stuartcarnie Dec 11, 2019

Choose a reason for hiding this comment

stuartcarnie Dec 11, 2019

Choose a reason for hiding this comment

stuartcarnie Dec 11, 2019

Choose a reason for hiding this comment

stuartcarnie Dec 11, 2019

Choose a reason for hiding this comment

stuartcarnie left a comment

Choose a reason for hiding this comment

AlirieGray commented Dec 10, 2019 •

edited

Loading