Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: avoid flush too many small sst file #1003

Merged

Conversation

MichaelLeeHZ
Copy link
Contributor

@MichaelLeeHZ MichaelLeeHZ commented Jun 19, 2023

Rationale

Currently, we attempt to flush the table that consumes the maximum memory when the system memory usage limit is reached for either space_write_buffer_size or db_write_buffer_size. However, if the target table is currently undergoing flushing, its memory usage will not be released, causing the preprocess_flush (freeze small memtables) function to be repeatedly triggered. This can result in the creation of many small SST files, potentially causing query issues.

Detailed Changes

  • Move preprocess_flush into flush_job
  • Split swith_memtables_or_suggest_duration into 2 methods, and make swith_memtables return maxium sequence number.

Test Plan

@MichaelLeeHZ MichaelLeeHZ changed the title Chore: select max mutable memory usage table to flush fix: select max mutable memory usage table to flush Jun 19, 2023
analytic_engine/src/compaction/scheduler.rs Outdated Show resolved Hide resolved
analytic_engine/src/instance/write.rs Outdated Show resolved Hide resolved
analytic_engine/src/instance/write.rs Outdated Show resolved Hide resolved
analytic_engine/src/table/data.rs Show resolved Hide resolved
@MichaelLeeHZ MichaelLeeHZ marked this pull request as ready for review June 20, 2023 08:39
analytic_engine/src/table/version.rs Outdated Show resolved Hide resolved
analytic_engine/src/table/version.rs Show resolved Hide resolved
analytic_engine/src/table/version.rs Outdated Show resolved Hide resolved
analytic_engine/src/table/version.rs Outdated Show resolved Hide resolved
analytic_engine/src/table/version.rs Show resolved Hide resolved
@MichaelLeeHZ MichaelLeeHZ changed the title fix: select max mutable memory usage table to flush fix: avoid flush too many sst file Jun 20, 2023
@MichaelLeeHZ MichaelLeeHZ changed the title fix: avoid flush too many sst file fix: avoid flush too many small sst file Jun 20, 2023
@ShiKaiWi ShiKaiWi merged commit cd2b688 into apache:main Jun 27, 2023
dust1 pushed a commit to dust1/ceresdb that referenced this pull request Aug 9, 2023
## Rationale
Currently, we attempt to flush the table that consumes the maximum
memory when the system memory usage limit is reached for either
`space_write_buffer_size` or `db_write_buffer_size`. However, if the
target table is currently undergoing flushing, its memory usage will not
be released, causing the `preprocess_flush` (freeze small memtables)
function to be repeatedly triggered. This can result in the creation of
many small SST files, potentially causing query issues.

## Detailed Changes
* Move `preprocess_flush` into `flush_job`
* Split `swith_memtables_or_suggest_duration` into 2 methods, and make
`swith_memtables` return maxium sequence number.

## Test Plan
@ShiKaiWi ShiKaiWi mentioned this pull request Dec 21, 2023
tanruixiang pushed a commit that referenced this pull request Dec 21, 2023
## Rationale
#1003 tries to avoid frequent flush requests which may generate massive
small ssts, but the write stall is also removed in the normal write
path.

## Detailed Changes
Introduce the `min_flush_interval` to avoid frequent flush requests and
recover the write stall mechanism.

## Test Plan
Add unit tests for the frequent flush check.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants