Add metrics to measure the time a task waiting in history queue #6205

Shaddoll · 2024-07-31T21:13:10Z

What changed?
Add metrics to measure the time a task waiting in history queue, which is from the time the task is written to database to the time the task is pushed to matching service

Why?
improve observability

How did you test it?

Potential risks

Release notes

Documentation Changes

codecov · 2024-07-31T21:30:50Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.89%. Comparing base (38c295d) to head (b14e231).

Additional details and impacted files

Files	Coverage Δ
common/util.go	`78.85% <100.00%> (+0.38%)`	⬆️
service/history/task/task.go	`79.12% <ø> (ø)`
...vice/history/task/transfer_active_task_executor.go	`66.99% <100.00%> (+0.54%)`	⬆️
service/matching/handler/context.go	`46.15% <100.00%> (ø)`
service/matching/tasklist/task_list_manager.go	`63.65% <100.00%> (-0.42%)`	⬇️

... and 3 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 38c295d...b14e231. Read the comment docs.

common/util_test.go

taylanisikdemir · 2024-08-01T02:18:05Z

service/history/task/transfer_active_task_executor.go

-	return t.pushActivity(ctx, task, timeout, mutableState.GetExecutionInfo().PartitionConfig)
+	err = t.pushActivity(ctx, task, timeout, mutableState.GetExecutionInfo().PartitionConfig)
+	if err == nil {
+		scope := common.NewPerTaskListScope(domainName, task.TaskList, types.TaskListKindNormal, t.metricsClient, metrics.TransferActiveTaskActivityScope)


is it cheap to create these per tasklist scopes on the fly and discard? this will be done per shard so there will be 16k x num_tasklists of these across hosts

service/history/task/transfer_active_task_executor.go

taylanisikdemir · 2024-08-01T02:20:29Z

service/history/task/transfer_active_task_executor.go

are you planning to do the same for timer task executor?

only activity retry timer pushes tasks to matching, but retry mostly happen after activity started event, it might be misleading to have this metric

I didn't consider retry case but for example a user timer firing with X seconds delay and hence creating the corresponding decision task with X seconds delay should be captured by our new "overhead breakdown metrics".
We might not need tasklist level granularity all the way though so existing timer task latencies might be fine. However if we are introducing tasklist level granularity for decision/timer task delays in history queues we can also consider the same for timer tasks.

decision/activity tasks are from transfer queue.
only activity retry timer from timer queue pushes activity tasks to matching.
I'm not sure which timer tasks you're referring to.

Shaddoll requested review from neil-xie, davidporter-id-au, Groxx, shijiesheng, agautam478, jakobht, 3vilhamster, sankari165, dkrotx, taylanisikdemir and demirkayaender as code owners July 31, 2024 21:13

Shaddoll force-pushed the history-metrics branch 2 times, most recently from 0370a7a to d2ead6c Compare July 31, 2024 22:59

taylanisikdemir approved these changes Aug 1, 2024

View reviewed changes

Add metrics to measture the time a task waiting in history queue

b14e231

Shaddoll force-pushed the history-metrics branch from d2ead6c to b14e231 Compare August 1, 2024 02:56

Shaddoll merged commit 95bc620 into uber:master Aug 1, 2024
20 checks passed

Shaddoll deleted the history-metrics branch August 1, 2024 16:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metrics to measure the time a task waiting in history queue #6205

Add metrics to measure the time a task waiting in history queue #6205

Shaddoll commented Jul 31, 2024 •

edited

Loading

codecov bot commented Jul 31, 2024 •

edited

Loading

taylanisikdemir Aug 1, 2024

taylanisikdemir Aug 1, 2024

Shaddoll Aug 1, 2024

taylanisikdemir Aug 1, 2024

Shaddoll Aug 1, 2024

Add metrics to measure the time a task waiting in history queue #6205

Add metrics to measure the time a task waiting in history queue #6205

Conversation

Shaddoll commented Jul 31, 2024 • edited Loading

codecov bot commented Jul 31, 2024 • edited Loading

Codecov Report

taylanisikdemir Aug 1, 2024

Choose a reason for hiding this comment

taylanisikdemir Aug 1, 2024

Choose a reason for hiding this comment

Shaddoll Aug 1, 2024

Choose a reason for hiding this comment

taylanisikdemir Aug 1, 2024

Choose a reason for hiding this comment

Shaddoll Aug 1, 2024

Choose a reason for hiding this comment

Shaddoll commented Jul 31, 2024 •

edited

Loading

codecov bot commented Jul 31, 2024 •

edited

Loading