-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Limit request size #17133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit request size #17133
Conversation
For the review I am specifically interested in three things:
|
@dakrone: Do you have time to review this PR? |
// on some occasions, we will get 0 (e.g. when the message is chunked and we get the first chunk) | ||
if (reservation > 0) { | ||
logger.debug("Increase amount of in-flight requests by [{}] bytes on channel id [{}]", reservation, channelId); | ||
assert reservationsPerChannel.containsKey(channelId) == false : "Should not contain multiple reservations per channel"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this is problematic, if HTTP pipe-lining is enabled, handleUpstream() will be called multiple times but handleDownstream() may not get called because the response isn't ready yet, then the assertion will fail or the entry in reservationsPerChannel is overwritten if JVM assertion is disabled.
See class org.elasticsearch.http.netty.pipelining.HttpPipeliningHandler, the processing of HTTP requests is out of order, just the responses are sent in order, so it's possible multiple requests on same channel are in-flight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good catch. I have changed the implementation now to handle also pipelined requests.
Maybe name the in-flight requests related settings to ".....requests....." or even "......inflight_requests....". |
|
||
private final Limiter limiter; | ||
|
||
private final Map<Integer, Reservation> reservationsPerChannel = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is AttributeKey (http://netty.io/4.0/api/io/netty/channel/ChannelHandler.html) cheaper than ConcurrentHashMap?
The booking in my patch Dieken@f2d487e is cheaper but it sits in higher level code.
Maybe introduce similar trick like OrderedUpstreamMessageEvent and OrderedDownstreamMessageEvent, let handleDownstream() know the reservation from its argument "e".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding AttributeKey
: I would rather rely on JDK classes as long as others don't provide a significant improvement. You're right that using LongAdder
is cheaper but on a higher level you already get request / response correlation basically for free.
The idea to piggyback on the message event is nice but I'd also avoid tying reservations together with the events.
@Dieken: Thanks for your comments. I am currently working on another issue but I am having a closer look ASAP. |
Regarding the HTTP status code 413: Here is the relevant part from RFC 2616, section 10
So we can add an |
|
||
@Override | ||
public void reset() throws IOException { | ||
addBytesChecked(-bytesWritten); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does reset()
actually "free" the bytes that have been written so far?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this does not free the bytes but I still think it is necessary to reduce the number of reserved bytes. Consider this scenario:
LimitingStreamOutput stream = new LimitingStreamOutput(new BytesStreamOutput());
stream.writeByte((byte) 0);
stream.writeByte((byte) 0);
stream.writeByte((byte) 0); // at this point we've written (and reserved) 3 bytes
stream.reset(); // to my understanding we should now reset the number of reserved bytes to 0
stream.writeByte((byte) 0); // reserve 1 byte
stream.close(); // return all reserved bytes
Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the implementation:
@Override
public void reset() {
// shrink list of pages
if (bytes.size() > BigArrays.PAGE_SIZE_IN_BYTES) {
bytes = bigarrays.resize(bytes, BigArrays.PAGE_SIZE_IN_BYTES);
}
// go back to start
count = 0;
}
It doesn't necessarily "free" bytes, just resizes them back to a 16384 byte array if more than 16384 bytes have been written already, so I'm not sure what to do here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed I think both approaches are ok. If we're not reducing the amount in #reset()
we're a bit more conservative (as we keep the count although there may not be more memory allocated) but when #close()
is called we'll always free all written bytes, even if #reset()
has been called in between. So I'll change the implementation and don't reduce the count in #reset()
anymore.
@danielmitterdorfer left some comments on this! one question I do have is how much impact this will have on performance of HTTP and transport requests, do you have any idea how large the impact will be? |
Regarding your question on the performance impact: I'll run some benchmarks and post the results. |
I've microbenchmarked 3 scenarios with
(I did this so we have the same number of operations in all cases) I benchmarked also the current version of A rough summary (exact number depends on the scenario):
Details can be found in the gist: https://gist.github.com/danielmitterdorfer/871cd62d7e16e45ae79d It is way harder to microbenchmark the overhead of request size limiting on transport level as the new code is not that isolated as on HTTP level. However, I've run macrobenchmarks before and after the change to check for performance-related changes. The baseline for the test was the last commit on master that I've merged (d17fd33). I've ran two trials with geonames data against a single node Elasticsearch cluster with default settings. The benchmark first bulk-indexes data via HTTP, does a force-merge, runs 20 warmup iterations of all queries and then 10 timed iterations of these queries.
I also ran the same benchmark against a two node cluster with default settings to also exercise the transport code:
So in both cases the changes hide in the run-to-run variance of the benchmarks. |
@bleskes Thanks for your review. I have now introduced constants for all Netty headers, added Javadocs and additional helper methods in |
@@ -41,7 +41,8 @@ | |||
|
|||
@Override | |||
protected Object decode(ChannelHandlerContext ctx, Channel channel, ChannelBuffer buffer) throws Exception { | |||
if (buffer.readableBytes() < 6) { | |||
int sizeHeaderLength = NettyHeader.MARKER_BYTES_SIZE + NettyHeader.MESSAGE_LENGTH_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this final?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
sounds good @danielmitterdorfer |
Hmm... I think setting it to 200mb sounds reasonable to me? These are nice dynamic settings, so it's easy to change at runtime and I'd rather err too low than too high. |
@bleskes brought up that this is all requests, so 200mb is probably too small (and I agree), since the parent breaker limit is 70%, how about setting this to 60%? |
We have decided to not limit the respective circuit breaker but rely on the parent circuit breaker (which is currently set to 70%). I'll update code and docs accordingly and merge the PR afterwards. |
With this commit we limit the size of all in-flight requests on transport level. The size is guarded by a circuit breaker and is based on the content size of each request. By default we use 100% of available heap meaning that the parent circuit breaker will limit the maximum available size. This value can be changed by adjusting the setting network.breaker.inflight_requests.limit Relates elastic#16011
With this commit we limit the size of all in-flight requests on HTTP level. The size is guarded by the same circuit breaker that is also used on transport level. Similarly, the size that is used is HTTP content length. Relates elastic#16011
be7fa7d
to
117bc68
Compare
@@ -134,6 +141,14 @@ public void sendResponse(Throwable error) throws IOException { | |||
transportServiceAdapter.onResponseSent(requestId, action, error); | |||
} | |||
|
|||
private void close() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielmitterdorfer Why is this method called close
? It might make more sense if it were in the finally
blocks but as-is I'm confused by it. Help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my original implementation it was called indeed in the finally
block, but Boaz and I had a lengthy discussion about this one. It makes sense to free the reserved bytes at the beginning of the method as we're done with request processing at this point in time. Freeing at the beginning guards against any problems that might occur further down the line (admittedly this could also be solved in a finally block).
The name close()
just implies that you cannot reuse the channel after this method has been called and the precondition check guards against this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be in the finally
block, regardless of the name. We asynchronously write, the bytes are still held until after the write. 😄
If you've already had that discussion and concluded that at the beginning of the method is correct (and unswayed by my thoughts here), can we at least change the name? The semantics are just counterintuitive and a name change would at least improve that. Sorry, I do not have any good suggestions. 😦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We asynchronously write, the bytes are still held until after the write.
I don't follow this one? this is about freeing the bytes of the request not the bytes of the response? it's all just a heuristic anyway.
I'm good with doing it in a finally clause, though I think close() indicates well that the channel is closed and shouldn't be used again. If we agree on that I think that closing at the beginning and throwing an early exception if already closed is what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow this one? this is about freeing the bytes of the request not the bytes of the response?
Yes, I'm sorry, I confused myself and brought something irrelevant into this discussion.
I'm good with doing it in a finally clause, though I think close() indicates well that the channel is closed and shouldn't be used again.
That's why I don't like it, we are "closing" and then still proceeding to perform operations on the channel. I don't like the semantics. Can I suggest release
or unreserve
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, we don't really do ops on the NettyTransportChannel
but rather it's underlying fields which is similar in my mind to any clean up on close code. But I'm with you if you want to change this - I personally find release/unreserve even more confusing :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are "closing" and then still proceeding to perform operations on the channel.
That's a pure implementation detail. The caller cannot observe this. It just observes that after the response is sent once, it cannot perform any "write" operation (sending a response) again.
Can I suggest release or unreserve?
From the perspective of the channel it is still "closed", but if we really have to do this, then I'd more lean towards release
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a pure implementation detail.
That's exactly what we are discussing here though? The semantics of this implementation.
From the perspective of the channel it is still "closed", but if we really have to do this, then I'd more lean towards release.
This was never an issue though? It was added as part of this PR to release the tracked bytes (also added in this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think that close()
is fine as name but I am ok if you want to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @danielmitterdorfer, I pushed 86f1bed.
With this PR we add the possibility to limit the size of all in-flight requests on transport and HTTP level. The limit is guarded by a new in-flight request circuit breaker. It can be configured by
network.breaker.inflight_requests.limit
which is set to 100% of the available JVM heap per default. In practice, this means that this circuit breaker will only break if the parent circuit breaker breaks (this one is not new - but for completeness - the parent circuit breaker is configured byindices.breaker.total.limit
with a default value of 70% of totally available JVM heap).Closes #16011