Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statstics: reuse fmsketch #47070

Merged
merged 12 commits into from
Sep 19, 2023
Merged

statstics: reuse fmsketch #47070

merged 12 commits into from
Sep 19, 2023

Conversation

hawkingrei
Copy link
Member

@hawkingrei hawkingrei commented Sep 18, 2023

What problem does this PR solve?

Issue Number: close #47071

Problem Summary:

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)

before:

image

after:

image
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed do-not-merge/needs-tests-checked do-not-merge/needs-linked-issue labels Sep 18, 2023
@codecov
Copy link

codecov bot commented Sep 18, 2023

Codecov Report

Merging #47070 (179343e) into master (4450ae4) will decrease coverage by 0.3519%.
Report is 2 commits behind head on master.
The diff coverage is 100.0000%.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #47070        +/-   ##
================================================
- Coverage   72.9633%   72.6115%   -0.3519%     
================================================
  Files          1337       1358        +21     
  Lines        399007     405580      +6573     
================================================
+ Hits         291129     294498      +3369     
- Misses        89075      92380      +3305     
+ Partials      18803      18702       -101     
Flag Coverage Δ
integration 30.6066% <75.9259%> (?)
unit 72.9540% <100.0000%> (-0.0094%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 53.9913% <ø> (ø)
parser 84.9620% <ø> (-0.0108%) ⬇️
br 48.5758% <ø> (-4.3639%) ⬇️

Copy link
Member

@time-and-fate time-and-fate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic in this PR is very fragile.
In this PR:

  1. We put back the allFms in the pool in MergePartitionStats2GlobalStats
  2. We put back the GlobalStats.Fms in the pool in handleGlobalStats and updateGlobalStats, which is returned from MergePartitionStats2GlobalStats
  3. We put back the tableAllPartitionStats.Column.FMSketch in the pool in handleGlobalStats, which is also created in MergePartitionStats2GlobalStats.

The assumptions are very fragile, if we share structs in any two of the above three places, there will be bugs. And it's hard to realize this, because we put them back in the pool in totally different places.

statistics/fmsketch.go Outdated Show resolved Hide resolved
@time-and-fate
Copy link
Member

And we are handling the logic inconsistently.
When we call MergePartitionStats2GlobalStats from updateGlobalStats, we don't reuse the FM sketch in the allPartitionStats, but when we call MergePartitionStats2GlobalStats from handleGlobalStats, we reuse them.
Even in handleGlobalStats, we only reuse FM sketches from Column, but not reuse those from Index.

@time-and-fate
Copy link
Member

My suggestion is to do all the Put() in MergePartitionStats2GlobalStats. And it looks like we can even remove Fms from the struct GlobalStats.

@hawkingrei
Copy link
Member Author

I think the logic in this PR is very fragile. In this PR:

1. We put back the `allFms` in the pool in `MergePartitionStats2GlobalStats`

2. We put back the `GlobalStats.Fms` in the pool in `handleGlobalStats` and `updateGlobalStats`, which is returned from `MergePartitionStats2GlobalStats`

3. We put back the `tableAllPartitionStats.Column.FMSketch` in the pool in `handleGlobalStats`, which is also created in `MergePartitionStats2GlobalStats`.

The assumptions are very fragile, if we share structs in any two of the above three places, there will be bugs. And it's hard to realize this, because we put them back in the pool in totally different places.

In fact, we optimize the two places.

1、create FM sketch from the NewFmSketch
2、Create from the proto

all the implements create Fmsketch from the pool and release Fmsketch at the end of the lift.

the root cause is at the tableAllPartitionStats which the life of FMsketch is out of MergePartitionStats2GlobalStats.

@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 19, 2023
@hawkingrei
Copy link
Member Author

My suggestion is to do all the Put() in MergePartitionStats2GlobalStats. And it looks like we can even remove Fms from the struct GlobalStats.

I have refactor this code. Now we have an external cache in the mergeGlobalStatus. if it is not from the external, we will release it in the mergeGlobalStatus.

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Comment on lines 102 to 103
// initialized the globalStats
globalStats = new(GlobalStats)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we don't need this. We initiate globalStats below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// ReleaseAndPutToPool releases data structures of Table and put itself back to pool.
func (t *Table) ReleaseAndPutToPool() {
for _, col := range t.Columns {
col.FMSketch.DestroyAndPutToPool()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's safer to set FMSketch to nil after putting to the pool.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok to me to improve them in another PR or in another way at your convenience.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done,use map.clear to clean it.

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
@ti-chi-bot
Copy link

ti-chi-bot bot commented Sep 19, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qw4990, time-and-fate

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [qw4990,time-and-fate]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Sep 19, 2023
@ti-chi-bot
Copy link

ti-chi-bot bot commented Sep 19, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-09-19 02:05:06.020853202 +0000 UTC m=+569471.988441252: ☑️ agreed by qw4990.
  • 2023-09-19 13:37:16.094453453 +0000 UTC m=+611002.062041506: ☑️ agreed by time-and-fate.

@tiprow
Copy link

tiprow bot commented Sep 19, 2023

@hawkingrei: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
tiprow_fast_test 179343e link true /test tiprow_fast_test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@hawkingrei
Copy link
Member Author

/retest

@ti-chi-bot ti-chi-bot bot merged commit bb49dc1 into pingcap:master Sep 19, 2023
11 of 16 checks passed
@hawkingrei
Copy link
Member Author

/cherrypick release-6.5

@ti-chi-bot
Copy link
Member

@hawkingrei: new pull request created to branch release-6.5: #49573.

In response to this:

/cherrypick release-6.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Dec 19, 2023
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot bot pushed a commit that referenced this pull request Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reuse FMsketch to Avoid High Memory Allocation
4 participants