Add emitWorkflowVersionMetrics for pinot #6190

bowenxia · 2024-07-25T21:56:05Z

What changed?
Add emitWorkflowVersionMetrics for pinot. Because pinot doesn't support one aggr inside of anther like ES, I had to separate the query into 2.

find the aggr of workflowTypes, (the top 10 count of workflowTypes)
find the aggr of CadenceChangeVersion under a specific workflowType (the top 10 count of CadenceChangeVersion in a specific workflowType)

Why?
To make ES analyzer becomes a generic visibility analyzer

How did you test it?
unit test

Potential risks
At worst, query time might be 10x.
But doesn't matter too much.

Release notes

Documentation Changes

shijiesheng · 2024-07-25T22:17:11Z

service/worker/esanalyzer/workflow.go

+	return fmt.Sprintf(`
+SELECT WorkflowType, COUNT(*) AS count
+FROM %s
+WHERE DomainID = '%s'


nit: %q to replace '%s' according to https://pkg.go.dev/fmt

For strings, %q returns a double-quoted string safely escaped with Go syntax, but in Pinot, Where DomainID = "" doesn't work. It has to be single quoted.

service/worker/esanalyzer/workflow.go

codecov · 2024-07-25T22:34:02Z

Codecov Report

Attention: Patch coverage is 96.87500% with 4 lines in your changes missing coverage. Please review.

Project coverage is 73.12%. Comparing base (95ba44c) to head (4eed31a).
Report is 7 commits behind head on master.

Additional details and impacted files

Files	Coverage Δ
...rker/esanalyzer/domainWorkflowTypeCountWorkflow.go	`83.03% <100.00%> (ø)`
service/worker/esanalyzer/workflow.go	`89.91% <96.82%> (+8.35%)`	⬆️

... and 14 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 95ba44c...4eed31a. Read the comment docs.

service/worker/esanalyzer/workflow.go

shijiesheng · 2024-07-25T22:40:50Z

service/worker/esanalyzer/workflow.go

+		domainWorkflowVersionCount.WorkflowTypes = append(domainWorkflowVersionCount.WorkflowTypes, WorkflowTypeCount{
+			EsAggregateCount: EsAggregateCount{
+				AggregateKey:   workflowType,
+				AggregateCount: int64(workflowCount),


workflowCount is from first call; this will be different from the summation of counts from subsequent calls by workflowtypes. But you could instead use the summation to be at least self consistent.

Here's one sample result from ES:

{ "key": "UpfrontChargeWorkflow::start", "doc_count": 182, "versions": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "waitForPSPCallback-1", "doc_count": 149 } ] } },

The count of workflow type is different from the summation of the counts of CadenceChangeVersions. I was thinking if this is designed on purpose.

How about we group by WorkflowType and CadenceChangeVersion, so it can have the count per version and per type. I tried and it is working

SELECT JSON_EXTRACT_SCALAR(Attr, '$.CadenceChangeVersion', 'STRING_ARRAY') AS CadenceChangeVersion, COUNT(*) AS count, workflowtype FROM rta.rta.cadence_visibility_production WHERE IsDeleted = false AND CloseStatus = -1 AND StartTime > 0 AND JSON_EXTRACT_SCALAR(Attr, '$.CadenceChangeVersion', 'STRING_ARRAY') IS NOT NULL GROUP BY JSON_EXTRACT_SCALAR(Attr, '$.CadenceChangeVersion', 'STRING_ARRAY'), workflowtype ORDER BY count DESC

That query means to count all the workflowTypes which has CadenceChangeVersion. This is different from the ES result. For that ES query, it means to first, find the top 10 workflow types by count, and then, within these 10 workflow types, identify the top 10 CadenceChangeVersions count for each.

Discussed offline, group by version and type will filter the records without CadenceChangeVersion. Need to verify if we need to emit that count, if not we can go with this approach.

shijiesheng · 2024-07-25T22:41:29Z

service/worker/esanalyzer/workflow.go

+		return err
+	}
+	var domainWorkflowVersionCount DomainWorkflowVersionCount
+	for _, row := range response {


10x latency might be an issue for metrics emission. Could you parallelize it?

If we do this in parallel with multiple threads, is there a risk when metrics are emitted, the workflow still doesn't have all the data?

This metrics doesn't care about the latency, since we run it every 5 or 10 minutes. But we can eliminate the calls when we aggregate by both version and type

Discussed this with Ender offline as well. We are going to keep this approach.

Co-authored-by: Shijie Sheng <shengs@uber.com>

add emitWorkflowVersionMetrics for pinot

4c04d9a

bowenxia requested review from Shaddoll, neil-xie, davidporter-id-au, Groxx, shijiesheng, agautam478, jakobht, 3vilhamster, sankari165, dkrotx, taylanisikdemir and demirkayaender as code owners July 25, 2024 21:56

fmt

7ea714f

shijiesheng reviewed Jul 25, 2024

View reviewed changes

service/worker/esanalyzer/workflow.go Outdated Show resolved Hide resolved

shijiesheng reviewed Jul 25, 2024

View reviewed changes

service/worker/esanalyzer/workflow.go Outdated Show resolved Hide resolved

shijiesheng reviewed Jul 25, 2024

View reviewed changes

service/worker/esanalyzer/workflow.go Outdated Show resolved Hide resolved

shijiesheng reviewed Jul 25, 2024

View reviewed changes

bowenxia and others added 3 commits July 25, 2024 16:12

Update service/worker/esanalyzer/workflow.go

0f269be

Co-authored-by: Shijie Sheng <shengs@uber.com>

handle aggr errors

92e5791

wrap an error

4eed31a

shijiesheng approved these changes Jul 29, 2024

View reviewed changes

bowenxia merged commit 9a7a8a4 into master Jul 30, 2024
21 checks passed

bowenxia deleted the xbowen_refactor_ESanalyzer_02 branch July 30, 2024 03:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add emitWorkflowVersionMetrics for pinot #6190

Add emitWorkflowVersionMetrics for pinot #6190

bowenxia commented Jul 25, 2024 •

edited

Loading

shijiesheng Jul 25, 2024

bowenxia Jul 25, 2024

codecov bot commented Jul 25, 2024 •

edited

Loading

shijiesheng Jul 25, 2024

bowenxia Jul 25, 2024

neil-xie Jul 25, 2024

bowenxia Jul 25, 2024 •

edited

Loading

neil-xie Jul 26, 2024

shijiesheng Jul 25, 2024

bowenxia Jul 25, 2024

neil-xie Jul 25, 2024

bowenxia Jul 29, 2024 •

edited

Loading

Add emitWorkflowVersionMetrics for pinot #6190

Add emitWorkflowVersionMetrics for pinot #6190

Conversation

bowenxia commented Jul 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jul 25, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bowenxia Jul 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bowenxia Jul 29, 2024 • edited Loading

Choose a reason for hiding this comment

bowenxia commented Jul 25, 2024 •

edited

Loading

codecov bot commented Jul 25, 2024 •

edited

Loading

bowenxia Jul 25, 2024 •

edited

Loading

bowenxia Jul 29, 2024 •

edited

Loading