Rewrite/improve basic load test #4399

longquanzheng · 2021-08-20T22:16:57Z

What changed?
Rewrite basic load test

Add more documentation
Use activity retry to launch
Use deterministic workflowID for stressWorkflow
Improve verification

Why?

How did you test it?
Tested locally

(qlong-bench-imp) $cadence --do cadence-bench  wf ob -w 45edc243-9eb5-4774-b42f-196849759644
Progress:
  1, 2021-08-20T15:14:15-07:00, WorkflowExecutionStarted
  2, 2021-08-20T15:14:15-07:00, DecisionTaskScheduled
  3, 2021-08-20T15:14:15-07:00, DecisionTaskStarted
  4, 2021-08-20T15:14:15-07:00, DecisionTaskCompleted
  5, 2021-08-20T15:14:15-07:00, ActivityTaskScheduled
  6, 2021-08-20T15:14:15-07:00, ActivityTaskStarted
  7, 2021-08-20T15:14:25-07:00, ActivityTaskCompleted
  8, 2021-08-20T15:14:25-07:00, DecisionTaskScheduled
  9, 2021-08-20T15:14:25-07:00, DecisionTaskStarted
  10, 2021-08-20T15:14:25-07:00, DecisionTaskCompleted
  11, 2021-08-20T15:14:25-07:00, TimerStarted
  12, 2021-08-20T15:14:45-07:00, TimerFired
  13, 2021-08-20T15:14:45-07:00, DecisionTaskScheduled
  14, 2021-08-20T15:14:45-07:00, DecisionTaskStarted
  15, 2021-08-20T15:14:45-07:00, DecisionTaskCompleted
  16, 2021-08-20T15:14:45-07:00, ActivityTaskScheduled
  17, 2021-08-20T15:14:45-07:00, ActivityTaskStarted
  18, 2021-08-20T15:14:45-07:00, ActivityTaskCompleted
  19, 2021-08-20T15:14:45-07:00, DecisionTaskScheduled
  20, 2021-08-20T15:14:45-07:00, DecisionTaskStarted
  21, 2021-08-20T15:14:45-07:00, DecisionTaskCompleted
  22, 2021-08-20T15:14:45-07:00, WorkflowExecutionCompleted

Result:
  Run Time: 1 seconds
  Status: COMPLETED
  Output: "TEST PASSED: true; Details report: timeoutCount: 0, failedCount: 0, openCount:0, launchCount: 100, maxThreshold:1"

Potential risks

Release notes

Documentation Changes

coveralls · 2021-08-20T22:49:21Z

Pull Request Test Coverage Report for Build 8941da23-91f4-4b6b-a9fd-6019e635d004

0 of 0 changed or added relevant lines in 0 files are covered.
37 unchanged lines in 8 files lost coverage.
Overall coverage decreased (-0.02%) to 56.401%

Files with Coverage Reduction	New Missed Lines	%
common/task/weightedRoundRobinTaskScheduler.go	1	89.64%
common/types/mapper/thrift/shared.go	2	64.98%
service/history/execution/mutable_state_builder.go	2	69.86%
service/history/queue/timer_queue_processor.go	2	58.77%
service/history/queue/timer_gate.go	3	95.83%
service/history/queue/timer_queue_processor_base.go	5	78.56%
service/history/execution/context.go	6	68.54%
common/persistence/sql/sqlExecutionStore.go	16	60.0%

Totals
Change from base Build 78be6221-be34-442c-8a27-540400fd920d:	-0.02%
Covered Lines:	79085
Relevant Lines:	140219

💛 - Coveralls

bench/load/basic/launchWorkflow.go

yycptt

Overall looks good to me.

yycptt · 2021-08-27T20:54:46Z

bench/load/basic/launchWorkflow.go

+	if passed {
+		return finalResult, nil
+	}
+	return "", fmt.Errorf(finalResult)


nit: may want to change the finalResult to start with TEST FAILED here.

okay sounds good.

yycptt · 2021-08-27T21:03:02Z

bench/load/basic/launchWorkflow.go

 		return "", err
 	}
-	return result.String(), nil
+	passed := (result.TimeoutCount + result.OpenCount + result.FailedCount) <= int(maxTolerantFailure)


For open workflow, shall we fail the test as long as the open count is non-zero (same as the old behavior)? As it typically means workflow got stuck due to lost transfer/timer task and is a very important issue.
Another possibility is that the timer processing latency becomes so high and Cadence can't even timeout the workflows within the specified wait duration (that's why I used 5min as the wait time buffer instead of 10s), in this case failing the bench also sounds reasonable to me.

I see. That makes sense. So the failure threshold should only be applied to timeout+failed.

yycptt · 2021-08-27T21:06:17Z

bench/load/basic/launchWorkflow.go


+	if len(openWfs.Executions) > 0 {
+		opens = len(openWfs.Executions)


so this means at least # of opens workflows are still open and there may be more than that?

Yes. This is due to the limitation of basic visibility. There is no count API so we have to use a pageSize as the limit. The page size here is the threshold+1 should it should be the same accurate as using count API with advanced visibility.

longquanzheng requested review from yux0 and yycptt August 20, 2021 22:16

longquanzheng changed the title ~~Rewrite basic load test~~ Rewrite/improve basic load test Aug 20, 2021

meiliang86 requested a review from a team August 23, 2021 05:02

yycptt reviewed Aug 24, 2021

View reviewed changes

bench/load/basic/launchWorkflow.go Show resolved Hide resolved

bench/load/basic/launchWorkflow.go Outdated Show resolved Hide resolved

bench/load/basic/launchWorkflow.go Show resolved Hide resolved

longquanzheng force-pushed the qlong-bench-imp branch from 2aea8f0 to 54af362 Compare August 26, 2021 21:19

github-actions bot force-pushed the qlong-bench-imp branch from d50a5bf to 6eef524 Compare August 26, 2021 22:07

longquanzheng force-pushed the qlong-bench-imp branch from 6eef524 to ec0cbde Compare August 26, 2021 23:10

yycptt approved these changes Aug 27, 2021

View reviewed changes

github-actions bot force-pushed the qlong-bench-imp branch from ec0cbde to 0cdfbb7 Compare August 27, 2021 21:10

longquanzheng force-pushed the qlong-bench-imp branch from 0cdfbb7 to 0c23974 Compare August 27, 2021 21:27

longquanzheng added the automerge label Aug 27, 2021

longquanzheng mentioned this pull request Aug 27, 2021

Auto publish bench and canary images with master tag #4426

Closed

github-actions bot force-pushed the qlong-bench-imp branch 2 times, most recently from b6fa7b3 to 036106c Compare August 27, 2021 22:33

longquanzheng removed the automerge label Aug 27, 2021

longquanzheng added 13 commits August 27, 2021 15:43

done rewrite basic load test

46adc67

report

80cfa69

fix all

2d32dfd

update readme

f2c9c01

add basicValidation

ccb98f7

Done

37be349

config

1c22938

add max retry option

a5c9cd2

addres comments

2b54aa2

refactor

dea0415

fix typo

0681e88

lint warn

da8f41e

fix bug

9beb891

longquanzheng added 8 commits August 27, 2021 15:43

add doc

f2c2f48

address comments

dd11899

add wait time buffer config

f9dfbdb

improve docs

d5cbaf3

use master image tag

4c4d40c

update readme

28413f3

fix canary tag

dafed7c

fix bug

fbdc047

longquanzheng force-pushed the qlong-bench-imp branch from 036106c to fbdc047 Compare August 27, 2021 22:43

longquanzheng merged commit 0b98055 into master Aug 27, 2021

longquanzheng deleted the qlong-bench-imp branch August 27, 2021 23:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite/improve basic load test #4399

Rewrite/improve basic load test #4399

longquanzheng commented Aug 20, 2021 •

edited

Loading

coveralls commented Aug 20, 2021 •

edited

Loading

yycptt left a comment

yycptt Aug 27, 2021

longquanzheng Aug 27, 2021

yycptt Aug 27, 2021

longquanzheng Aug 27, 2021

yycptt Aug 27, 2021

longquanzheng Aug 27, 2021


		if len(openWfs.Executions) > 0 {
		opens = len(openWfs.Executions)

Rewrite/improve basic load test #4399

Rewrite/improve basic load test #4399

Conversation

longquanzheng commented Aug 20, 2021 • edited Loading

coveralls commented Aug 20, 2021 • edited Loading

Pull Request Test Coverage Report for Build 8941da23-91f4-4b6b-a9fd-6019e635d004

💛 - Coveralls

yycptt left a comment

Choose a reason for hiding this comment

yycptt Aug 27, 2021

Choose a reason for hiding this comment

longquanzheng Aug 27, 2021

Choose a reason for hiding this comment

yycptt Aug 27, 2021

Choose a reason for hiding this comment

longquanzheng Aug 27, 2021

Choose a reason for hiding this comment

yycptt Aug 27, 2021

Choose a reason for hiding this comment

longquanzheng Aug 27, 2021

Choose a reason for hiding this comment

longquanzheng commented Aug 20, 2021 •

edited

Loading

coveralls commented Aug 20, 2021 •

edited

Loading