Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build table snapshot on write node for disaggregated read from S3 #6919

Merged
merged 15 commits into from
Mar 6, 2023

Conversation

JaySon-Huang
Copy link
Contributor

@JaySon-Huang JaySon-Huang commented Feb 28, 2023

What problem does this PR solve?

Issue Number: ref #6827

Problem Summary:

What is changed and how it works?

Depends on:

Changes:

  • Add DisaggregatedTaskId for the task id, it is based on MPPId and executor_id
  • Add DAGContext::is_disaggregated_task DAGContext::disaggregated_id
  • Add DisaggregatedReadSnapshot to hold the segment snapshot with DisaggregatedTaskId
  • Add DisaggregatedSnapshotManager to hold the snapshots of logical table in the write node
  • FlashService::EstablishDisaggregatedTask
    • run into DAGStorageInterpreter::buildLocalStreams
    • build segment snapshots from StorageDeltaMerge
    • register the snapshots into DisaggregatedSnapshotManager in the write node
  • the compute node will build the read task and try to read most data from S3 according to the EstablishDisaggregatedTaskResponse
  • FlashService::FetchDisaggregatedPages
    • compute nodes will use this PRC to fetch the delta-layer data from write nodes
  • Remove the dangling snapshot from DisaggregatedSnapshotManager if the read node doesn't try to finish/cancel the request

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Feb 28, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • breezewish
  • guo-shaoge

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 28, 2023
@JaySon-Huang JaySon-Huang force-pushed the disagg_read_snap branch 4 times, most recently from 3072251 to 61471a5 Compare March 2, 2023 08:19
@JaySon-Huang JaySon-Huang changed the title [WIP] Serialize table snapshot for disaggregated read Build table snapshot on write node for disaggregated read from S3 Mar 2, 2023
@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 2, 2023
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@ti-chi-bot ti-chi-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed do-not-merge/needs-linked-issue labels Mar 2, 2023
@JaySon-Huang JaySon-Huang removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 2, 2023
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

Copy link
Member

@breezewish breezewish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only some minor suggestions

/* const */ bool is_mpp_task = false;
/* const */ bool is_root_mpp_task = false;
/* const */ bool is_batch_cop = false;
/* const */ bool is_disaggregated_task = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use this flag for both: requests in rn and establish requests in wn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rn does not need to create a DAGContext with is_disaggregated_task==true as it only needs to send disaggregated task. This flag is only set when a wn receives a disaggregated task

dbms/src/Flash/FlashService.h Outdated Show resolved Hide resolved
db_context->setMockStorage(mock_storage);
db_context->setMockMPPServerInfo(mpp_test_info);

RUNTIME_CHECK(context->isDisaggregatedStorageMode());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off topic: When will we use context and when will we use db_context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context - actually it is a pointer to the global context
db_context - a temporary context that its lifetime lasts only until the end of this request

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db_context is a session-only context, some value of db_context->getSettings() is not the same as the global settings.

dbms/src/Flash/FlashService.cpp Outdated Show resolved Hide resolved
dbms/src/Storages/DeltaMerge/Remote/Serializer.cpp Outdated Show resolved Hide resolved
dbms/src/Core/Defines.h Outdated Show resolved Hide resolved
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Mar 3, 2023
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 3, 2023
Comment on lines +313 to +315
// Note that `buildRemoteRequests` must be called after `buildLocalStreams` because
// `buildLocalStreams` will setup `region_retry_from_local_region` and we must
// retry those regions or there will be data lost.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detect by test case "tests/fullstack-test/fault-inject/exception_after_read_from_storage.test"
/cc @breezewish

@ti-chi-bot ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 3, 2023
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Mar 6, 2023
JaySon-Huang and others added 13 commits March 6, 2023 14:42
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
@JaySon-Huang
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Member

@JaySon-Huang: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: c886f7d

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Mar 6, 2023
@JaySon-Huang
Copy link
Contributor Author

/run-integration-test

1 similar comment
@SeaRise
Copy link
Contributor

SeaRise commented Mar 6, 2023

/run-integration-test

@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@JaySon-Huang
Copy link
Contributor Author

/run-integration-test

@ti-chi-bot ti-chi-bot merged commit 5604724 into pingcap:master Mar 6, 2023
@JaySon-Huang JaySon-Huang deleted the disagg_read_snap branch March 9, 2023 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants