Skip to content

Use recycled byte arrays for peer recovery #50107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from

Conversation

Tim-Brooks
Copy link
Contributor

Currently, every peer recovery file chunk request must allocated the
bytes for the file chunk when the request is received. This PR
implements to concept of a releasable bytes reference which certain
stream types can optimize to be shared or pooled. This commit implements
this mechanism for the file chunk requests. The pooled chunks are
managed at the transport level and will be released when the response
is submitted. Additionally, the releasable references are ref counted
so async implementations can retain them.

@Tim-Brooks Tim-Brooks added WIP :Distributed Coordination/Network Http and internode communication implementations :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v8.0.0 v7.6.0 labels Dec 11, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Network)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a proper review, I was just curious about this change, but I left one small comment.

@Tim-Brooks Tim-Brooks requested a review from ywelsch December 12, 2019 17:06
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some initial comments. I think that the overall approach could work, but also think that getting this right without leaking / incorrectly accessing stuff is going to be tricky. We should add as many safeguards as possible.

}

@Override
public void close() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need to protect against double-closing as it is super dangerous here (one caller double closes and releases underlying byte array while another still thinks it's safe to operate on it) . A first step would be to have an assertion here checking the refCount()

synchronized (this) {
FileChunk chunk;
while ((chunk = pendingChunks.poll()) != null) {
chunk.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are we closing the FileChunkWriter?
Also how are we ensuring that nothing is being added to the FileChunkWriter anymore after it has been closed?
Otherwise we are at risk of creating leaks here?

throws IOException {
assert Transports.assertNotTransportThread("multi_file_writer");
final FileChunkWriter writer = fileChunkWriters.computeIfAbsent(fileMetaData.name(), name -> new FileChunkWriter());
writer.writeChunk(new FileChunk(fileMetaData, content, position, lastChunk));
writer.writeChunk(new FileChunk(fileMetaData, content.retain(), position, lastChunk));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you calling retain here?
Wouldn't it be safer to do that at the point where it's successfully added to the pendingChunks queue?

@@ -69,6 +75,7 @@ public void sendResponse(Exception exception) throws IOException {
try {
outboundHandler.sendErrorResponse(version, channel, requestId, action, exception);
} finally {
Releasables.close(toRelease);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to put this into the release method, which protects against double-releasing (see breaker)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that the closing logic currently relies on InboundHandler.messageReceived not throwing any exception. Given that the initial lifecycle (until retain() is called) is ultimately controlled through that method, I wonder if we need extra safekeeping.

private static class ReleasableArraysStreamInput extends FilterStreamInput {

private final BigArrays bigArrays;
private final List<Releasable> managedResources;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if instead of managing the list here, we should just offer a callback to register manageable items. This avoids anyone mistakenly messing with the list, and we can also limit the lifecycle during which can be registered.


public abstract class InboundMessage extends NetworkMessage implements Closeable {

private final StreamInput streamInput;
private final List<Releasable> managedResources = new ArrayList<>(4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unused


public abstract class InboundMessage extends NetworkMessage implements Closeable {

private final StreamInput streamInput;
private final List<Releasable> managedResources = new ArrayList<>(4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unused

@@ -98,6 +120,7 @@ InboundMessage deserialize(BytesReference reference) throws IOException {
return message;
} finally {
if (success == false) {
Releasables.close(managedResources);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the list always be empty here

@polyfractal polyfractal added v7.7.0 and removed v7.6.0 labels Jan 15, 2020
@Tim-Brooks
Copy link
Contributor Author

This work will probably come back at some point. But this PR is pretty out of date. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Network Http and internode communication implementations :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.8.0 v8.0.0-alpha1 WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants