Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline: support multi-level feedback queue #7393

Merged
merged 20 commits into from
May 10, 2023

Conversation

SeaRise
Copy link
Contributor

@SeaRise SeaRise commented Apr 26, 2023

What problem does this PR solve?

Issue Number: ref #6518

Problem Summary:

What is changed and how it works?

  • Use TaskProfileInfo to record cpu_execute_time, cpu_pending_time, io_execute_time, io_pending_time and await_time of Task.
  • support multi-level feedback queue base on TaskProfileInfo.
///    +------------+     +------------+       +------------+           +------------+
///    | UnitQueue 1|     | UnitQueue 3|       | UnitQueue 3|    ...    | UnitQueue 8|
///    +------------+     +------------+       +------------+           +------------+
///          ^                   ^                   ^                        ^
///          |                   |                   |                        |
/// +--------+--------+  +-------+--------+  +-------+--------+       +-------+--------+
/// | Task 1          |  | Task 6         |  | Task 11        |       | Task 16        |
/// +-----------------+  +----------------+  +----------------+       +----------------+
///          ^                   ^                   ^                        ^
///          |                   |                   |                        |
/// +--------v--------+  +-------v--------+  +-------v--------+       +-------v--------+
/// | Task 2          |  | Task 7         |  | Task 12        |       | Task 17        |
/// +-----------------+  +----------------+  +----------------+       +----------------+
///          ^                   ^                   ^                        ^
///          |                   |                   |                        |
/// +--------v--------+  +-------v--------+  +-------v--------+       +-------v--------+
/// | Task 3          |  | Task 8         |  | Task 13        |       | Task 18        |
/// +-----------------+  +----------------+  +----------------+       +----------------+
///          ^                   ^                   ^                        ^
///          |                   |                   |                        |
/// +--------v--------+  +-------v--------+  +-------v--------+       +-------v--------+
/// | Task 4          |  | Task 9         |  | Task 14        |       | Task 19        |
/// +-----------------+  +----------------+  +----------------+       +----------------+

Check List

Tests

  • tsan passed
    • gtest_filter=*Event*
    • gtest_filter=*TaskScheduler*
    • gtest_filter=*Executor*
    • gtest_filter=*ComputeServerRunner*
    • gtest_filter=*TestMLFQTaskQueue*
  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Apr 26, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • windtalker
  • xzhangxian1008

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 26, 2023
@SeaRise
Copy link
Contributor Author

SeaRise commented Apr 26, 2023

/run-all-tests

@SeaRise SeaRise changed the title WIP: Pipeline: support mlfq Pipeline: support mlfq Apr 26, 2023
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 26, 2023
@SeaRise SeaRise mentioned this pull request Apr 26, 2023
25 tasks
@SeaRise SeaRise changed the title Pipeline: support mlfq Pipeline: support multi level feedback queue Apr 27, 2023
@SeaRise SeaRise changed the title Pipeline: support multi level feedback queue Pipeline: support multi-level feedback queue Apr 27, 2023
@xzhangxian1008
Copy link
Contributor

/assign

@SeaRise
Copy link
Contributor Author

SeaRise commented Apr 27, 2023

/rebuild

@ti-chi-bot ti-chi-bot bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 28, 2023
@xzhangxian1008
Copy link
Contributor

Could you briefly describe what factors determine the priority of a task?

@ti-chi-bot ti-chi-bot bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 4, 2023
@SeaRise
Copy link
Contributor Author

SeaRise commented May 4, 2023

Could you briefly describe what factors determine the priority of a task?

ok, added in https://github.com/pingcap/tiflash/pull/7393/files#diff-53742cbbf90fcb7d07521ec0386e00c92507570de3e07c2a20422c4a91b1b84cR61-R68

@SeaRise
Copy link
Contributor Author

SeaRise commented May 5, 2023

/run-all-tests

@SeaRise
Copy link
Contributor Author

SeaRise commented May 5, 2023

/run-all-tests

@SeaRise SeaRise requested a review from windtalker May 8, 2023 03:52
task_queue.push_back(std::move(task));
}

double UnitQueue::accuTimeAfterDivisor()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename it to normalizedTime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, renamed.

dbms/src/Operators/UnorderedSourceOp.h Show resolved Hide resolved
// The executing task should yield if it takes more than `YIELD_MAX_TIME_SPENT_NS`.
if (status != Impl::TargetStatus || execute_time_ns >= YIELD_MAX_TIME_SPENT_NS)
if (status != Impl::TargetStatus || total_time_spent >= YIELD_MAX_TIME_SPENT_NS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YIELD_MAX_TIME_SPENT_NS is the same for different level queue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be the same, because the minimum time slice of the queue is 200ms, which is greater than the 100ms here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@SeaRise SeaRise requested a review from windtalker May 9, 2023 05:40
// The executing task should yield if it takes more than `YIELD_MAX_TIME_SPENT_NS`.
if (status != Impl::TargetStatus || execute_time_ns >= YIELD_MAX_TIME_SPENT_NS)
if (status != Impl::TargetStatus || total_time_spent >= YIELD_MAX_TIME_SPENT_NS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK


static QueueType newTaskQueue()
{
return std::make_unique<FIFOTaskQueue>();
return std::make_unique<CPUMultiLevelFeedbackQueue>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cpu queue use MultiLevelFeedbackQueue and io queue use FIFOTaskQueue? And I think maybe we should add a configure variable to decide which queue to used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think IO-related operations should use a different type of queue, such as performing spill before restore.

Copy link
Contributor Author

@SeaRise SeaRise May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think maybe we should add a configure variable to decide which queue to used?

ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config added.

@SeaRise SeaRise requested a review from windtalker May 10, 2023 03:35
Copy link
Contributor

@windtalker windtalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added the status/LGT1 Indicates that a PR has LGTM 1. label May 10, 2023
UnitType io_pending_time = 0; \
UnitType await_time = 0;

class LocalTaskProfileInfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does Local mean? Do we have RemoteTaskProfileInfo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in the future, all TaskProfileInfo will be counted together to calculate the amount of resources used by the query, but now I can remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to TaskProfileInfo.

class LocalTaskProfileInfo
{
public:
PROFILE_MEMBER(UInt64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the PROFILE_MEMBER is used at only one place, is macro necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, removed.


class LocalTaskProfileInfo
{
public:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put these variable in private sector and get them with related interfaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done

Copy link
Contributor

@xzhangxian1008 xzhangxian1008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other LGTM

Comment on lines +15 to +17
#include <Flash/Pipeline/Schedule/TaskQueues/MultiLevelFeedbackQueue.h>
#include <assert.h>
#include <common/likely.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <Flash/Pipeline/Schedule/TaskQueues/MultiLevelFeedbackQueue.h>
#include <assert.h>
#include <common/likely.h>
#include <Flash/Pipeline/Schedule/TaskQueues/MultiLevelFeedbackQueue.h>
#include <common/likely.h>
#include <assert.h>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But clang-format likes this :)

Comment on lines 17 to 20
#include <Common/Logger.h>
#include <Common/MemoryTracker.h>
#include <Flash/Pipeline/Schedule/Tasks/TaskProfileInfo.h>
#include <memory.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <Common/Logger.h>
#include <Common/MemoryTracker.h>
#include <Flash/Pipeline/Schedule/Tasks/TaskProfileInfo.h>
#include <memory.h>
#include <Common/Logger.h>
#include <Common/MemoryTracker.h>
#include <Flash/Pipeline/Schedule/Tasks/TaskProfileInfo.h>
#include <memory.h>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But clang-format likes this :)

@ti-chi-bot ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels May 10, 2023
SeaRise and others added 3 commits May 10, 2023 17:03
…Queue.cpp

Co-authored-by: xzhangxian1008 <xzhangxian@foxmail.com>
Co-authored-by: xzhangxian1008 <xzhangxian@foxmail.com>
@SeaRise
Copy link
Contributor Author

SeaRise commented May 10, 2023

/merge

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented May 10, 2023

@SeaRise: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented May 10, 2023

This pull request has been accepted and is ready to merge.

Commit hash: f593ab2

@ti-chi-bot ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label May 10, 2023
@SeaRise
Copy link
Contributor Author

SeaRise commented May 10, 2023

/run-unit-test

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented May 10, 2023

@SeaRise: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

trigger some heavy tests which will not run always when PR updated.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot bot merged commit 241a19a into pingcap:master May 10, 2023
@SeaRise SeaRise deleted the support_mlfq branch May 10, 2023 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants