-
Notifications
You must be signed in to change notification settings - Fork 512
db: compactions: calculate eventual output level #5333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
We no longer calculate `maxOutputFileSize` and `maxOverlapBytes` inside `pickedTableCompaction`, as they are not used there. We calculate them in `newCompaction` instead. We no longer store a `maxReadCompactionBytes` field; we calculate it in the one place we need it.
We improve the compaction code to check if the levels below the compaction bounds are empty, in which case we adjust the output file size and other sstable writer options to correspond to the eventual level (after move compactions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbowens reviewed 5 of 5 files at r1, 5 of 5 files at r2, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumeerbhola reviewed 5 of 5 files at r1.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @RaduBerinde)
compaction.go
line 276 at r2 (raw file):
// // Because of move compactions, we know that any sstables produced by this // compaction will be later moved to eventualOutputLevel. So we use
(drive-by) I suppose something new could be compacted into this level, before the move compaction. Since the move compaction may be delayed if the level score is low. I wonder whether it would be better to just place it in the final output level instead of relying on such move compactions happening at the right time. And by making these files bigger while not putting them in the lowest level possible, we do have the small risk that the size of a compaction increases since now say a L2=>L3 compaction could encounter a much larger file in L3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @sumeerbhola)
compaction.go
line 276 at r2 (raw file):
I suppose something new could be compacted into this level, before the move compaction.
How can that happen? I believe that the other compaction would necessarily have to overlap with this compaction on the outputLevel which is not allowed.
Since the move compaction may be delayed if the level score is low.
Good point..
wonder whether it would be better to just place it in the final output level instead of relying on such move compactions happening at the right time.
I can explore that. It would reduce some manifest churn as well. I think in that case we can update the ouptutLevel when we create the compaction, and make sure that any overlap checks include any intermediary levels. Can you think of anything else that might be a problem?
I will say that as I worked on this and tried to come up with a testcase without defining a specific LSM, I am starting to doubt the utility of this whole thing, especially since it's difficult to do it for flushes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @RaduBerinde)
compaction.go
line 276 at r2 (raw file):
I can explore that. It would reduce some manifest churn as well
I haven't really given this more than passing thought, wrt whether there are write amp downsides to it.
It will require some care wrt coordination with ingests.
Previously, sumeerbhola wrote…
Yeah, it looks like not an easy change. I will probably merge this as-is but after branch cut. |
db: minor compaction cleanup
We no longer calculate
maxOutputFileSize
andmaxOverlapBytes
inside
pickedTableCompaction
, as they are not used there. Wecalculate them in
newCompaction
instead.We no longer store a
maxReadCompactionBytes
field; we calculate itin the one place we need it.
db: compactions: calculate eventual output level
We improve the compaction code to check if the levels below the
compaction bounds are empty, in which case we adjust the output file
size and other sstable writer options to correspond to the eventual
level (after move compactions).