Skip to content

Improve in-memory virtual map mode to run garbage collection for merged copies #17331

@artemananiev

Description

This is a follow-up for #15448, a corner case that was not addressed in that feature.

Assume there are many copies (fast copy versions) of a virtual map, and some copy is not released for some reason:

  • Since the copy is not released, it cannot be flushed or merged
  • The next copy after that can be merged, though, and the next after next, too
  • Copies are merged till at some point (some version) a copy is so large that its size exceeds flush threshold, and it can't be merged any longer
  • The next copy after that can still be merged
  • And so on

So in the end the list of copies in the virtual pipeline will look like this (newest to oldest):

  • mutable copy
  • immutable copy version N, contains changes from versions N - X + 1 to N
  • immutable copy version N - X, contains changes from version N - 2X + 1 to N - X
  • ...
  • immutable copy version M, which is never released

The problem above is that all these intermediate copies may contain lots of obsolete mutations. Current in-memory mode implementation is that garbage collection is never run for these copies, but it should be.

This ticket is to improve in-memory mode for virtual maps:

  • Every copy is first checked if its size exceeds flush threshold
  • If so, garbage collection is run for this copy, otherwise no GC
  • Then the size of the copy is checked again. If it still exceeds the threshold, the copy is flushed
  • Otherwise it is merged to the next version

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions