Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: optimize the layout of global sort files and other #48275

Merged
merged 64 commits into from
Nov 24, 2023

Conversation

wjhuang2016
Copy link
Member

@wjhuang2016 wjhuang2016 commented Nov 3, 2023

What problem does this PR solve?

Issue Number: ref #48779

Problem Summary:

What is changed and how it works?

The old layout is keyLen: uint64, key: char[keyLen], valueLen: uint64, value: char[valueLen]. In this PR we change it to keyLen: uint64, valueLen: uint64, key: char[keyLen], value: char[valueLen]. Compared with the old PR, we can merge reading key and value into one IO and split the data afterwards, and each IO does not require the previous data buffer to be retained in memory.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

benchmark of merge iter reading

$ go test ./br/pkg/lightning/backend/external -v --tags=intest -test.run TestCompareReader --testing-storage-uri "s3://xxx"
this PR:
    bench_test.go:414: sequential read speed for 1258291200 bytes: 52.64 MB/s
    bench_test.go:421: concurrent read speed for 1258291200 bytes: 575.51 MB/s
    bench_test.go:428: merge iter read speed for 1258291200 bytes: 150.50 MB/s

master:
    bench_test.go:413: sequential read speed for 1258291200 bytes: 51.65 MB/s
    bench_test.go:420: concurrent read speed for 1258291200 bytes: 516.10 MB/s
    bench_test.go:427: merge iter read speed for 1258291200 bytes: 112.53 MB/s

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Signed-off-by: wjhuang2016 <huangwenjun1997@gmail.com>
@ti-chi-bot ti-chi-bot bot added do-not-merge/invalid-title do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-tests-checked size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 3, 2023
Copy link

codecov bot commented Nov 3, 2023

Codecov Report

Merging #48275 (56337f1) into master (3543275) will decrease coverage by 16.9222%.
Report is 13 commits behind head on master.
The diff coverage is 85.3658%.

Additional details and impacted files
@@                Coverage Diff                @@
##             master     #48275         +/-   ##
=================================================
- Coverage   71.0083%   54.0861%   -16.9222%     
=================================================
  Files          1367       1582        +215     
  Lines        404865     596180     +191315     
=================================================
+ Hits         287488     322451      +34963     
- Misses        97372     251125     +153753     
- Partials      20005      22604       +2599     
Flag Coverage Δ
integration 35.0588% <68.2926%> (?)
unit 71.3596% <85.3658%> (+0.3513%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 53.9663% <ø> (ø)
parser ∅ <ø> (∅)
br 55.3775% <85.3658%> (+2.3006%) ⬆️

Signed-off-by: wjhuang2016 <huangwenjun1997@gmail.com>
@ti-chi-bot ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 3, 2023
Signed-off-by: wjhuang2016 <huangwenjun1997@gmail.com>
Signed-off-by: wjhuang2016 <huangwenjun1997@gmail.com>
Signed-off-by: wjhuang2016 <huangwenjun1997@gmail.com>
Signed-off-by: wjhuang2016 <huangwenjun1997@gmail.com>
Signed-off-by: wjhuang2016 <huangwenjun1997@gmail.com>
Signed-off-by: wjhuang2016 <huangwenjun1997@gmail.com>
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 6, 2023
Signed-off-by: wjhuang2016 <huangwenjun1997@gmail.com>
Signed-off-by: wjhuang2016 <huangwenjun1997@gmail.com>
Signed-off-by: wjhuang2016 <huangwenjun1997@gmail.com>
Signed-off-by: wjhuang2016 <huangwenjun1997@gmail.com>
Signed-off-by: wjhuang2016 <huangwenjun1997@gmail.com>
Signed-off-by: wjhuang2016 <huangwenjun1997@gmail.com>
Signed-off-by: wjhuang2016 <huangwenjun1997@gmail.com>
Signed-off-by: wjhuang2016 <huangwenjun1997@gmail.com>
Signed-off-by: wjhuang2016 <huangwenjun1997@gmail.com>
Signed-off-by: wjhuang2016 <huangwenjun1997@gmail.com>
@ti-chi-bot ti-chi-bot bot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 7, 2023
Signed-off-by: lance6716 <lance6716@gmail.com>
@lance6716 lance6716 changed the title *: optimize the layout of global sort files *: optimize the layout of global sort files and other Nov 22, 2023
Signed-off-by: lance6716 <lance6716@gmail.com>
Signed-off-by: lance6716 <lance6716@gmail.com>
}
// If the reader has fewer than n bytes remaining in current buffer,
// `auxBuf` is used as a container instead.
if n > 1024*1024*1024 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it 1024*1024*1024?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a defensive coding, and the author of this line is @wjhuang2016 let's ask him

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,if there is something wrong we can return an error instead of panic. Just choose a large number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to do this outside of readNBytes() and log the file name & other info.

Could you change this to 1 * size.GB instead and add comment for it?

Signed-off-by: lance6716 <lance6716@gmail.com>
Signed-off-by: lance6716 <lance6716@gmail.com>
Signed-off-by: lance6716 <lance6716@gmail.com>
Comment on lines +342 to +344
binary.BigEndian.AppendUint64(dataBuf[:lengthBytes], uint64(len(idxVal)))
keyAdapter.Encode(dataBuf[2*lengthBytes:2*lengthBytes:2*lengthBytes+encodedKeyLen], idxKey, rowID)
copy(dataBuf[2*lengthBytes+encodedKeyLen:], idxVal)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onefile writer should change too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the order to keyLen, valueLen, key, value in onefile_writer.go line 108~111.

Signed-off-by: lance6716 <lance6716@gmail.com>
@lance6716
Copy link
Contributor

ptal @ywqzzy @tangenta

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Nov 24, 2023
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Nov 24, 2023
Copy link

ti-chi-bot bot commented Nov 24, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-11-24 08:02:57.195123157 +0000 UTC m=+564205.860349352: ☑️ agreed by tangenta.
  • 2023-11-24 08:03:45.364566666 +0000 UTC m=+564254.029792862: ☑️ agreed by lance6716.

Copy link

ti-chi-bot bot commented Nov 24, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lance6716, tangenta, YuJuncen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Nov 24, 2023
@lance6716
Copy link
Contributor

/retest

2 similar comments
@lance6716
Copy link
Contributor

/retest

@lance6716
Copy link
Contributor

/retest

@ti-chi-bot ti-chi-bot bot merged commit 1263445 into pingcap:master Nov 24, 2023
20 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants