Use recycled byte arrays for peer recovery #50107

Tim-Brooks · 2019-12-11T23:25:23Z

Currently, every peer recovery file chunk request must allocated the
bytes for the file chunk when the request is received. This PR
implements to concept of a releasable bytes reference which certain
stream types can optimize to be shared or pooled. This commit implements
this mechanism for the file chunk requests. The pooled chunks are
managed at the transport level and will be released when the response
is submitted. Additionally, the releasable references are ref counted
so async implementations can retain them.

…bytes

elasticmachine · 2019-12-11T23:25:26Z

Pinging @elastic/es-distributed (:Distributed/Network)

…bytes

DaveCTurner

Not a proper review, I was just curious about this change, but I left one small comment.

server/src/main/java/org/elasticsearch/transport/InboundMessage.java

ywelsch

I've left some initial comments. I think that the overall approach could work, but also think that getting this right without leaking / incorrectly accessing stuff is going to be tricky. We should add as many safeguards as possible.

ywelsch · 2019-12-13T15:52:00Z

server/src/main/java/org/elasticsearch/common/bytes/ReleasableBytesReference.java

+    }
+
+    @Override
+    public void close() {


I wonder if we need to protect against double-closing as it is super dangerous here (one caller double closes and releases underlying byte array while another still thinks it's safe to operate on it) . A first step would be to have an assertion here checking the refCount()

ywelsch · 2019-12-13T16:12:21Z

server/src/main/java/org/elasticsearch/indices/recovery/MultiFileWriter.java

+            synchronized (this) {
+                FileChunk chunk;
+                while ((chunk = pendingChunks.poll()) != null) {
+                    chunk.close();


Where are we closing the FileChunkWriter?
Also how are we ensuring that nothing is being added to the FileChunkWriter anymore after it has been closed?
Otherwise we are at risk of creating leaks here?

ywelsch · 2019-12-13T16:17:39Z

server/src/main/java/org/elasticsearch/indices/recovery/MultiFileWriter.java

        throws IOException {
        assert Transports.assertNotTransportThread("multi_file_writer");
        final FileChunkWriter writer = fileChunkWriters.computeIfAbsent(fileMetaData.name(), name -> new FileChunkWriter());
-        writer.writeChunk(new FileChunk(fileMetaData, content, position, lastChunk));
+        writer.writeChunk(new FileChunk(fileMetaData, content.retain(), position, lastChunk));


why are you calling retain here?
Wouldn't it be safer to do that at the point where it's successfully added to the pendingChunks queue?

ywelsch · 2019-12-13T16:29:15Z

server/src/main/java/org/elasticsearch/transport/TcpTransportChannel.java

@@ -69,6 +75,7 @@ public void sendResponse(Exception exception) throws IOException {
        try {
            outboundHandler.sendErrorResponse(version, channel, requestId, action, exception);
        } finally {
+            Releasables.close(toRelease);


I would prefer to put this into the release method, which protects against double-releasing (see breaker)

Also note that the closing logic currently relies on InboundHandler.messageReceived not throwing any exception. Given that the initial lifecycle (until retain() is called) is ultimately controlled through that method, I wonder if we need extra safekeeping.

ywelsch · 2019-12-13T16:38:29Z

server/src/main/java/org/elasticsearch/transport/InboundMessage.java

+    private static class ReleasableArraysStreamInput extends FilterStreamInput {
+
+        private final BigArrays bigArrays;
+        private final List<Releasable> managedResources;


I wonder if instead of managing the list here, we should just offer a callback to register manageable items. This avoids anyone mistakenly messing with the list, and we can also limit the lifecycle during which can be registered.

ywelsch · 2019-12-13T17:04:13Z

server/src/main/java/org/elasticsearch/transport/InboundMessage.java


 public abstract class InboundMessage extends NetworkMessage implements Closeable {

    private final StreamInput streamInput;
+    private final List<Releasable> managedResources = new ArrayList<>(4);


this is unused

ywelsch · 2019-12-13T17:07:53Z

server/src/main/java/org/elasticsearch/transport/InboundMessage.java


 public abstract class InboundMessage extends NetworkMessage implements Closeable {

    private final StreamInput streamInput;
+    private final List<Releasable> managedResources = new ArrayList<>(4);


this is unused

ywelsch · 2019-12-13T17:13:50Z

server/src/main/java/org/elasticsearch/transport/InboundMessage.java

@@ -98,6 +120,7 @@ InboundMessage deserialize(BytesReference reference) throws IOException {
                return message;
            } finally {
                if (success == false) {
+                    Releasables.close(managedResources);


Shouldn't the list always be empty here

Tim-Brooks · 2020-04-22T18:35:43Z

This work will probably come back at some point. But this PR is pretty out of date. Closing.

Tim-Brooks added 7 commits December 4, 2019 14:10

WIP

31c6d64

More changes

aa05e0f

Merge remote-tracking branch 'upstream/master' into transport_retain_…

0500404

…bytes

Changes

4de1377

Changes

899ea11

WIP

196ae29

Merge remote-tracking branch 'upstream/master' into transport_retain_…

d030a88

…bytes

Tim-Brooks added WIP :Distributed Coordination/Network Http and internode communication implementations :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v8.0.0 v7.6.0 labels Dec 11, 2019

Tim-Brooks added 3 commits December 11, 2019 16:28

Changes

36e0445

Merge remote-tracking branch 'upstream/master' into transport_retain_…

42a167c

…bytes

Merge remote-tracking branch 'upstream/master' into transport_retain_…

485b40e

…bytes

DaveCTurner reviewed Dec 12, 2019

View reviewed changes

server/src/main/java/org/elasticsearch/transport/InboundMessage.java Outdated Show resolved Hide resolved

Increase safety

df3cc51

Tim-Brooks requested a review from ywelsch December 12, 2019 17:06

ywelsch reviewed Dec 13, 2019

View reviewed changes

$@polyfractal$ polyfractal added v7.7.0 and removed v7.6.0 labels Jan 15, 2020

Changes

5697c48

bpintea added v7.8.0 and removed v7.7.0 labels Mar 25, 2020

Tim-Brooks mentioned this pull request Mar 25, 2020

Move transport decoding and aggregation to server #48263

Merged

Tim-Brooks closed this Apr 22, 2020

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use recycled byte arrays for peer recovery #50107

Use recycled byte arrays for peer recovery #50107

Uh oh!

Tim-Brooks commented Dec 11, 2019

Uh oh!

elasticmachine commented Dec 11, 2019

Uh oh!

DaveCTurner left a comment

Uh oh!

Uh oh!

ywelsch left a comment

Uh oh!

ywelsch Dec 13, 2019

Uh oh!

ywelsch Dec 13, 2019

Uh oh!

ywelsch Dec 13, 2019

Uh oh!

ywelsch Dec 13, 2019

Uh oh!

ywelsch Dec 13, 2019

Uh oh!

ywelsch Dec 13, 2019

Uh oh!

ywelsch Dec 13, 2019

Uh oh!

ywelsch Dec 13, 2019

Uh oh!

ywelsch Dec 13, 2019

Uh oh!

Tim-Brooks commented Apr 22, 2020

Uh oh!

Uh oh!

Use recycled byte arrays for peer recovery #50107

Use recycled byte arrays for peer recovery #50107

Uh oh!

Conversation

Tim-Brooks commented Dec 11, 2019

Uh oh!

elasticmachine commented Dec 11, 2019

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tim-Brooks commented Apr 22, 2020

Uh oh!

Uh oh!