Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OAK-11154: Read partial segments from SegmentWriter #1746

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

Nicolapps
Copy link
Contributor

This pull request modifies the SegmentWriter interface in oak-segment-tar to add the possibility of reading the state of a segment currently being written to, as described in OAK-11154.

Closes OAK-11154

Why?

oak-segment-tar writes new segments using an implementation of SegmentWriter.

Since segments are immutable, the state of a segment that hasn’t been flushed yet isn’t visible outside of the SegmentWriter instance. However, in some cases, code using SegmentWriter might want to access the partial segment data.

Currently, the only possible way to do it is to call flush, which will force the segment to be flushed right away, and then get the full segment from the underlying segment store. This is bad for performance, because we need to do more flushes that necessary, and because there’s a risk of creating a lot of segments that have a size much smaller than MAX_SEGMENT_SIZE.

To avoid this, I suggest that we add a readPartialSegmentState method to SegmentWriter, which takes the segment ID of an unflushed segment and returns it if possible.

Backwards-compatibility

This change is backwards-compatible with existing users of SegmentWriter (because they’re not using the new method). The new method comes with a default implementation which throws an UnsupportedOperationException.

Concurrency

Previously, the class was marked as not thread-safe, which made sense since it was only expected that a single writer thread uses it at the same time (concurrent calls wouldn’t have made sense since the order in which prepare and writeXYZ methods are called matters).

One major change with SegmentBufferWriter is that its readPartialSegmentState method can now be called concurrently with the other methods in the same class. To support this, we now use synchronized on the methods that are accessible publicly. This shouldn’t cause a drop in performance, because most calls to the class are on the writer thread (so not concurrent between themselves), and it is expected from readPartialSegmentState to be called rarely (compared to the other methods).

I could confirm that there is no noticeable drop in performance by running the write benchmarks without and with the change, and observed no difference:

Without synchronized

# ConcurrentWriteReadTest          C     min     10%     50%     90%     max     N       mean
Oak-Segment-Tar                    1       1       5      11      61    258    2535      24
# ConcurrentWriteTest              C     min     10%     50%     90%     max     N       mean
Oak-Segment-Tar                    1      29      31      36      58    622    1373      44
# BasicWriteTest                   C     min     10%     50%     90%     max     N       mean
Oak-Segment-Tar                    1      14      15      16      19    320    3448      17

With synchronized

# ConcurrentWriteReadTest          C     min     10%     50%     90%     max     N       mean
Oak-Segment-Tar                    1       1       4      12      58    553    2531      24
# ConcurrentWriteTest              C     min     10%     50%     90%     max     N       mean
Oak-Segment-Tar                    1      29      31      35      65    461    1319      46
# BasicWriteTest                   C     min     10%     50%     90%     max     N       mean
Oak-Segment-Tar                    1      14      15      16      19     96    3444      17

Testing

The PR adds a new test, readPartialSegmentState, which covers the implementation of the method in SegmentBufferWriter.

@Nicolapps Nicolapps marked this pull request as ready for review September 27, 2024 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant