Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support sub compaction to speed up large compaction #70

Merged
merged 6 commits into from
Dec 20, 2018

Conversation

bobotu
Copy link

@bobotu bobotu commented Dec 17, 2018

Use sub compaction to avoid write stall due to large L1 to L2 compaction block L0 compaction, and speed up L0 to L1 compaction.

levels.go Outdated Show resolved Hide resolved
levels.go Outdated Show resolved Hide resolved
@coocood
Copy link
Member

coocood commented Dec 17, 2018

Can we only use the bottom level bounds to split the compaction?
And since the bottom level is usually 10x size of the top level, we can ignore the data size in top level, then we don't need to implement approximate size in range.

@bobotu
Copy link
Author

bobotu commented Dec 18, 2018

@coocood This heuristic algorithm is adopted from RocksDB.

  1. Select boundaries based on the natural boundary of input levels/files.
    • first and last key of L0 files
    • first and last key of non-0, non-last levels
    • first key of each SST file of the last level
  2. Sort boundaries and unique.
  3. Use ApproximateSize to estimate the data size in each boundary.
  4. Merge boundaries to eliminate empty and smaller-than-average ranges.
    • find the average size in each range
    • starting from beginning greedily merge adjacent ranges until their total size exceeds the average

We should consider the size of each input file, so we cannot ignore the bounds of L0 files. Because each sst file has nearly the same size, each boundary added in step 1 represents like a data distribution in the whole input.

Supposing L0 inputs [a, d] [c, d] [e, j] and L1 inputs [a, f] [f, i] [i, k]. The result of step 2 is [a, c] [c, d] [d, e] [e, f] [f, i] [i, j] [j, k]. As you can see, the large range [a, f] is split into many smaller ranges.

Then we estimate the size of each small range, compute the estimated size of each sub compaction. Merge small ranges into a larger one.

This algorithm is not so determinate, because we cannot split the whole input equally without iterate over them, so we use some heuristic rules here, and it worked out well so far.

BTW, RocksDB disabled sub compaction for non-L0 levels, but I enabled this for L1 when it touches more than 10 sst, otherwise it may block the L0 -> L1 compaction.

@ngaut
Copy link
Member

ngaut commented Dec 19, 2018

Ping @coocood

@coocood
Copy link
Member

coocood commented Dec 19, 2018

@bobotu
We can reserve the size estimation and remove the L0 bounds.

levels.go Outdated Show resolved Hide resolved
levels.go Show resolved Hide resolved
@coocood
Copy link
Member

coocood commented Dec 20, 2018

LGTM

@coocood coocood merged commit 48654df into pingcap:master Dec 20, 2018
@bobotu bobotu deleted the compaction branch August 10, 2020 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants