Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load data: physical mode part1 #42817

Merged
merged 20 commits into from
Apr 7, 2023
Merged

load data: physical mode part1 #42817

merged 20 commits into from
Apr 7, 2023

Conversation

D3Hunter
Copy link
Contributor

@D3Hunter D3Hunter commented Apr 4, 2023

What problem does this PR solve?

Issue Number: ref #40499

Problem Summary:

What is changed and how it works?

  • ingegrate lightning physical mode into load data

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Apr 4, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • GMHDBJD
  • lance6716

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added the release-note-none Denotes a PR that doesn't merit a release note. label Apr 4, 2023
@D3Hunter D3Hunter requested a review from lance6716 April 4, 2023 10:26
@ti-chi-bot ti-chi-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 4, 2023
@D3Hunter D3Hunter requested a review from GMHDBJD April 4, 2023 10:26
@D3Hunter D3Hunter changed the title load data: physical mode load data: physical mode part1 Apr 4, 2023

checksum verify.KVChecksum
encoder kvEncoder
kvStore tidbkv.Storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use codec instead of kvstore?

Copy link
Contributor Author

@D3Hunter D3Hunter Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lightning uses it to alloc auto-id in common.AllocGlobalAutoID too, but this place can be removed

@hawkingrei
Copy link
Member

/test all

@hawkingrei
Copy link
Member

/retest

tableMeta := &mydump.MDTableMeta{
DB: e.DBName,
Name: e.Table.Meta().Name.O,
DataFiles: e.toMyDumpFiles(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add TODO for IsRowOrdered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@D3Hunter
Copy link
Contributor Author

D3Hunter commented Apr 6, 2023

/retest

Copy link
Contributor

@GMHDBJD GMHDBJD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Apr 6, 2023
Copy link
Contributor

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 / 11 files viewed


func (b *deliverKVBatch) add(kvs *kv.Pairs) {
for _, pair := range kvs.Pairs {
if pair.Key[tablecodec.TableSplitKeyLen+1] == 'r' {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use tablecodec.IsRecordKey

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deliverCompleteCh := make(chan deliverResult)
go func() {
defer close(deliverCompleteCh)
err := p.deliverLoop(ctx)
Copy link
Contributor

@lance6716 lance6716 Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not important, but I prefer the error group pattern. When the job is finished, close the channel to notify the consumer. When any loop meets error, error group will automatically cancel the derived context and the error can be found by eg.Wait()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplified from lightning code, will add a comment and do it later

@D3Hunter
Copy link
Contributor Author

D3Hunter commented Apr 6, 2023

/retest

@D3Hunter
Copy link
Contributor Author

D3Hunter commented Apr 7, 2023

/retest

@D3Hunter
Copy link
Contributor Author

D3Hunter commented Apr 7, 2023

multiple-valued index related test will add later

Copy link
Contributor

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

s.tk.MustExec("use " + db)
}

func (s *mockGCSSuite) TestPhysicalMode() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In future we can move some tests to a common package, and let different backend share the tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

physical related can only run in realtikvtest(check_dev_2), we can move logical test here.

tidbCfg := tidb.GetGlobalConfig()
// todo: add job id too
sortPathSuffix := "import-" + strconv.Itoa(int(tidbCfg.Port))
sortPath := filepath.Join(tidbCfg.TempDir, sortPathSuffix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found both TempDir and TempStoragePath and they don't have comment, not sure how to use them. @tangenta can you help us?

And will lightning's builtin disk quota conflicts with TiDB's?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copied from here

sortPath := filepath.Join(tidbCfg.TempDir, sortPathSuffix)

will lightning's builtin disk quota conflicts with TiDB

yes, we should precheck whether's ongoing add-index.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No a big problem, we can check it later @tangenta PTAL at your convenience

ConnCompressType: config.CompressionNone,
WorkerConcurrency: config.DefaultRangeConcurrency * 2,
KVWriteBatchSize: config.KVWriteBatchSize,
CheckpointEnabled: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to turn on checkpoint now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local backend report error when the sort-dir already exists & checkpoint disabled

will add a todo and fix it later

DupeDetectEnabled: false,
DuplicateDetectOpt: local.DupDetectOpt{ReportErrOnDup: false},
StoreWriteBWLimit: 0,
ShouldCheckWriteStall: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be true because we don't periodically switch import mode so the cluster may be slow to ingest. Today I meet a ONCALL 😭

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Apr 7, 2023
@D3Hunter
Copy link
Contributor Author

D3Hunter commented Apr 7, 2023

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 66d0c78

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Apr 7, 2023
@D3Hunter
Copy link
Contributor Author

D3Hunter commented Apr 7, 2023

/retest

@ti-chi-bot ti-chi-bot merged commit 2af85d1 into master Apr 7, 2023
@ti-chi-bot ti-chi-bot deleted the physical-mode branch April 7, 2023 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants