Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor](StmtExecutor)(step-1) Extract profile logic from StmtExecutor and Coordinator #19219

Merged
merged 2 commits into from
May 6, 2023

Conversation

morningman
Copy link
Contributor

@morningman morningman commented Apr 28, 2023

Proposed changes

Subtask of #19195

Problem summary

This PR mainly changes:

Add new classes for Profile

Previously, we use RuntimeProfile class directly, and because there are multiple level in profile, so you can see there may be several RuntimeProfile instances be to maintain.

I created several new classes for profile:

class Profile:
	The root profile of a execution task(query or load)
	
class SummaryProfile:
	The profile that contains summary info of a execution task,
	such as start time, end time, query id. etc.
	
class ExecutionProfile:
	The profile for a single Coordinator. Each Coordinator will
	have a ExecutionProfile.

The profile structure is as following:

Profile:
	SummaryProfile:
	ExecutionProfile 1:
		Fragment 0:
			Instance 0:
			Instance 1:
			...
		Fragment 1:
		...
	ExecutionProfile 2:
		...

You can see, each Profile has a SummaryProfile and one or more ExecutionProfile.
For most kinds of job, such as query/insert, there is only one ExecutionProfile. But for broker load job, will may be more than one ExecutionProfile, corresponding to each sub task of the load job.

How to use

  1. For query/insert, etc:

    • Each StmtExcutor will have a Profile instance.
    • Each Coordinator will have a ExecutionProfile instance.
    • StmtExcutor is responsible for the SummaryProfile, it will update the SummaryProfile during the execution.
    • Coordinator is responsible for the ExecutionProfile, it will first add ExecutionProfile to the child of Profile, and update the ExecutionProfile periodically during the execution.
  2. For Load/Export, etc:

    • Each job will hava a Profile instance.
    • For each Coordinator of this job, add its ExecutionProfile to the children of job's Profile.

Behavior Change

The columns of show load profile/show query profile and QueryProfile Web UI has changed to:

| Profile ID | Task Type | Start Time | End Time | Total | Task State | User | Default Db| Sql Statement | Is Cached | Total Instances Num | Instances Num Per BE | Parallel Fragment Exec Instance Num | Trace ID |

The Query Id and Job Id is removed and using Profile ID instead.
For load job, the profile id is job id, for query/insert, is query id.

Checklist(Required)

  • Does it affect the original behavior
  • Has unit tests been added
  • Has document been added or modified
  • Does it need to update dependencies
  • Is this PR support rollback (If NO, please explain WHY)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions github-actions bot added area/load Issues or PRs related to all kinds of load area/nereids area/planner Issues or PRs related to the query planner labels Apr 28, 2023
@morningman
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

hello-stephen commented Apr 28, 2023

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 35.99 seconds
stream load tsv: 423 seconds loaded 74807831229 Bytes, about 168 MB/s
stream load json: 22 seconds loaded 2358488459 Bytes, about 102 MB/s
stream load orc: 60 seconds loaded 1101869774 Bytes, about 17 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 6.6 seconds inserted 5,000,000 Rows, about 757K ops/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230505050717_clickbench_pr_138448.html

@morningman
Copy link
Contributor Author

run buildall

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit 42bac33 into apache:master May 6, 2023
Reminiscent pushed a commit to Reminiscent/doris that referenced this pull request May 15, 2023
…tor and Coordinator (apache#19219)

Previously, we use RuntimeProfile class directly, and because there are multiple level in profile, so you can see there may be several RuntimeProfile instances be to maintain.

I created several new classes for profile:

class Profile:
	The root profile of a execution task(query or load)
	
class SummaryProfile:
	The profile that contains summary info of a execution task,
	such as start time, end time, query id. etc.
	
class ExecutionProfile:
	The profile for a single Coordinator. Each Coordinator will
	have a ExecutionProfile.
The profile structure is as following:

Profile:
	SummaryProfile:
	ExecutionProfile 1:
		Fragment 0:
			Instance 0:
			Instance 1:
			...
		Fragment 1:
		...
	ExecutionProfile 2:
		...
You can see, each Profile has a SummaryProfile and one or more ExecutionProfile.
For most kinds of job, such as query/insert, there is only one ExecutionProfile. But for broker load job, will may be more than one ExecutionProfile, corresponding to each sub task of the load job.

How to use
For query/insert, etc:

Each StmtExcutor will have a Profile instance.
Each Coordinator will have a ExecutionProfile instance.
StmtExcutor is responsible for the SummaryProfile, it will update the SummaryProfile during the execution.
Coordinator is responsible for the ExecutionProfile, it will first add ExecutionProfile to the child of Profile, and update the ExecutionProfile periodically during the execution.
For Load/Export, etc:

Each job will hava a Profile instance.
For each Coordinator of this job, add its ExecutionProfile to the children of job's Profile.
Behavior Change
The columns of show load profile/show query profile and QueryProfile Web UI has changed to:

| Profile ID | Task Type | Start Time | End Time | Total | Task State | User | Default Db| Sql Statement | Is Cached | Total Instances Num | Instances Num Per BE | Parallel Fragment Exec Instance Num | Trace ID |
The Query Id and Job Id is removed and using Profile ID instead.
For load job, the profile id is job id, for query/insert, is query id.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/load Issues or PRs related to all kinds of load area/nereids area/planner Issues or PRs related to the query planner kind/behavior-changed kind/test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants