Limit request size #17133

danielmitterdorfer · 2016-03-16T10:04:21Z

With this PR we add the possibility to limit the size of all in-flight requests on transport and HTTP level. The limit is guarded by a new in-flight request circuit breaker. It can be configured by network.breaker.inflight_requests.limit which is set to 100% of the available JVM heap per default. In practice, this means that this circuit breaker will only break if the parent circuit breaker breaks (this one is not new - but for completeness - the parent circuit breaker is configured by indices.breaker.total.limit with a default value of 70% of totally available JVM heap).

Closes #16011

danielmitterdorfer · 2016-03-16T10:04:32Z

For the review I am specifically interested in three things:

Should configuration properties a standard naming scheme? Currently I have adhered to the naming scheme in the respective implementation classes where the properties are defined. Note that for the circuit breaker based properties we also have the properties transport.breaker.request.overhead and http.breaker.request.overhead.
Are the default values ok? I have basically derived the values based on the fact that http.max_content_length is already 100 MB by default and it would not make sense to put a lower default limit on all in-flight requests.
Should we adjust the default value indices.breaker.total.limit for the parent circuit breaker given that we have now introduced two new child breakers?

danielmitterdorfer · 2016-03-16T10:10:49Z

@dakrone: Do you have time to review this PR?

Dieken · 2016-03-16T14:23:39Z

core/src/main/java/org/elasticsearch/http/netty/RequestSizeLimitHandler.java

+                // on some occasions, we will get 0 (e.g. when the message is chunked and we get the first chunk)
+                if (reservation > 0) {
+                    logger.debug("Increase amount of in-flight requests by [{}] bytes on channel id [{}]", reservation, channelId);
+                    assert reservationsPerChannel.containsKey(channelId) == false : "Should not contain multiple reservations per channel";


I feel this is problematic, if HTTP pipe-lining is enabled, handleUpstream() will be called multiple times but handleDownstream() may not get called because the response isn't ready yet, then the assertion will fail or the entry in reservationsPerChannel is overwritten if JVM assertion is disabled.

See class org.elasticsearch.http.netty.pipelining.HttpPipeliningHandler, the processing of HTTP requests is out of order, just the responses are sent in order, so it's possible multiple requests on same channel are in-flight.

That's a good catch. I have changed the implementation now to handle also pipelined requests.

Dieken · 2016-03-16T15:05:39Z

Maybe name the in-flight requests related settings to ".....requests....." or even "......inflight_requests....".

Dieken · 2016-03-16T15:31:34Z

core/src/main/java/org/elasticsearch/http/netty/RequestSizeLimitHandler.java

+
+    private final Limiter limiter;
+
+    private final Map<Integer, Reservation> reservationsPerChannel = new ConcurrentHashMap<>();


Is AttributeKey (http://netty.io/4.0/api/io/netty/channel/ChannelHandler.html) cheaper than ConcurrentHashMap?

The booking in my patch Dieken@f2d487e is cheaper but it sits in higher level code.

Maybe introduce similar trick like OrderedUpstreamMessageEvent and OrderedDownstreamMessageEvent, let handleDownstream() know the reservation from its argument "e".

Regarding AttributeKey: I would rather rely on JDK classes as long as others don't provide a significant improvement. You're right that using LongAdder is cheaper but on a higher level you already get request / response correlation basically for free.

The idea to piggyback on the message event is nice but I'd also avoid tying reservations together with the events.

danielmitterdorfer · 2016-03-17T07:52:49Z

@Dieken: Thanks for your comments. I am currently working on another issue but I am having a closer look ASAP.

danielmitterdorfer · 2016-03-17T13:37:44Z

Regarding the HTTP status code 413: Here is the relevant part from RFC 2616, section 10

10.4.14 413 Request Entity Too Large

The server is refusing to process a request because the request entity is larger than the server is willing or able to process. The server MAY close the connection to prevent the client from continuing the request.

If the condition is temporary, the server SHOULD include a Retry- After header field to indicate that it is temporary and after what time the client MAY try again.

So we can add an Retry-After header (see also RFC 2616, section 14) with a relative delay of - say - 10 seconds to indicate that this condition may be temporary. Note that the condition is only temporary in case of multiple in-flight requests. If a single request hits the limit, it is still permanent (until somebody changes the setting of course). It may be even possible to detect these two cases but I am not sure it is worth the effort and additional code complexity.

dakrone · 2016-03-17T15:20:16Z

core/src/main/java/org/elasticsearch/common/limit/LimitingStreamOutput.java

+
+    @Override
+    public void reset() throws IOException {
+        addBytesChecked(-bytesWritten);


Does reset() actually "free" the bytes that have been written so far?

No, this does not free the bytes but I still think it is necessary to reduce the number of reserved bytes. Consider this scenario:

LimitingStreamOutput stream = new LimitingStreamOutput(new BytesStreamOutput()); stream.writeByte((byte) 0); stream.writeByte((byte) 0); stream.writeByte((byte) 0); // at this point we've written (and reserved) 3 bytes stream.reset(); // to my understanding we should now reset the number of reserved bytes to 0 stream.writeByte((byte) 0); // reserve 1 byte stream.close(); // return all reserved bytes

Wdyt?

Looking at the implementation:

@Override public void reset() { // shrink list of pages if (bytes.size() > BigArrays.PAGE_SIZE_IN_BYTES) { bytes = bigarrays.resize(bytes, BigArrays.PAGE_SIZE_IN_BYTES); } // go back to start count = 0; }

It doesn't necessarily "free" bytes, just resizes them back to a 16384 byte array if more than 16384 bytes have been written already, so I'm not sure what to do here

As discussed I think both approaches are ok. If we're not reducing the amount in #reset() we're a bit more conservative (as we keep the count although there may not be more memory allocated) but when #close() is called we'll always free all written bytes, even if #reset() has been called in between. So I'll change the implementation and don't reduce the count in #reset() anymore.

dakrone · 2016-03-17T21:33:20Z

@danielmitterdorfer left some comments on this! one question I do have is how much impact this will have on performance of HTTP and transport requests, do you have any idea how large the impact will be?

danielmitterdorfer · 2016-03-18T07:30:55Z

Regarding your question on the performance impact: I'll run some benchmarks and post the results.

danielmitterdorfer · 2016-03-21T12:10:03Z

I've microbenchmarked 3 scenarios with RequestSizeLimitHandler:

Handle two upstream and downstream events
Handle a upstream event, a pipelined upstream event, a pipelined downstream event, a downstream event
Handle a pipeline upstream event, a pipelined downstream event, a pipelined upstream event, a pipelined downstream event

(I did this so we have the same number of operations in all cases)

I benchmarked also the current version of RequestSizeLimitHandler against the original one (without pipeline support).

A rough summary (exact number depends on the scenario):

Metric	`RequestSizeLimitHandler` without pipeline support	`RequestSizeLimitHandler` (with pipeline support)
Allocation rate	56 bytes / op	79 - 104 bytes / op (scenario dependent)
Average Time	82 - 113 ns / op	94 - 119 ns / op

Details can be found in the gist: https://gist.github.com/danielmitterdorfer/871cd62d7e16e45ae79d

It is way harder to microbenchmark the overhead of request size limiting on transport level as the new code is not that isolated as on HTTP level. However, I've run macrobenchmarks before and after the change to check for performance-related changes.

The baseline for the test was the last commit on master that I've merged (d17fd33). I've ran two trials with geonames data against a single node Elasticsearch cluster with default settings. The benchmark first bulk-indexes data via HTTP, does a force-merge, runs 20 warmup iterations of all queries and then 10 timed iterations of these queries.

Metric	baseline (without size limiting)	size limiting enabled
Mean Indexing Throughput [docs/s]	13930, 12271	12775, 13331
99% latency match_all query [ms]	9.79, 9.93	10.00, 6.29
99% latency uncached aggregation query [ms]	115.75, 134.14	139.19, 119.25

I also ran the same benchmark against a two node cluster with default settings to also exercise the transport code:

Metric	baseline (without size limiting)	size limiting enabled
Mean Indexing Throughput [docs/s]	11119, 11711	11725, 11715
99% latency match_all query [ms]	9.93, 6.29	8.31, 8.21
99% latency uncached aggregation query [ms]	134.14, 116.26	118.19, 115.67

So in both cases the changes hide in the run-to-run variance of the benchmarks.

danielmitterdorfer · 2016-04-11T12:06:22Z

@bleskes Thanks for your review. I have now introduced constants for all Netty headers, added Javadocs and additional helper methods in Marker. It should now be quite obvious what is read w.r.t. to headers. I felt that it may also make sense to add additional methods in NettyHeaders that help with low-level header handling (like reading the marker bytes) but I did not do this in this PR. However, this might be worth doing in another small PR (later).

bleskes · 2016-04-11T12:53:39Z

core/src/main/java/org/elasticsearch/transport/netty/SizeHeaderFrameDecoder.java

@@ -41,7 +41,8 @@

    @Override
    protected Object decode(ChannelHandlerContext ctx, Channel channel, ChannelBuffer buffer) throws Exception {
-        if (buffer.readableBytes() < 6) {
+        int sizeHeaderLength = NettyHeader.MARKER_BYTES_SIZE + NettyHeader.MESSAGE_LENGTH_SIZE;


make this final?

bleskes · 2016-04-11T12:54:44Z

sounds good @danielmitterdorfer

dakrone · 2016-04-11T13:23:32Z

We still have the open question of what should be a good default value for this breaker. I feel we have two options:

Set it to some percentage of total node memory - the complicated thing here is that different nodes have different requirements. For example, coordinating only nodes can have a lot of inflight requests (this is their job) but data nodes may have different requirements.

Don't set a max and allow the parent circuit breaker to trigger where the sum of request + whatever the other breakers (field data etc.) need reaches above an accepted limit.

@dakrone @clintongormley do you have any opinions here?

Hmm... I think setting it to 200mb sounds reasonable to me? These are nice dynamic settings, so it's easy to change at runtime and I'd rather err too low than too high.

dakrone · 2016-04-11T13:28:09Z

@bleskes brought up that this is all requests, so 200mb is probably too small (and I agree), since the parent breaker limit is 70%, how about setting this to 60%?

danielmitterdorfer · 2016-04-12T11:48:10Z

@dakrone: The suggestion with 60% is fine for me. If I hear no objections I change it accordingly and update the docs. /cc:@bleskes

danielmitterdorfer · 2016-04-12T14:48:01Z

We have decided to not limit the respective circuit breaker but rely on the parent circuit breaker (which is currently set to 70%). I'll update code and docs accordingly and merge the PR afterwards.

With this commit we limit the size of all in-flight requests on transport level. The size is guarded by a circuit breaker and is based on the content size of each request. By default we use 100% of available heap meaning that the parent circuit breaker will limit the maximum available size. This value can be changed by adjusting the setting network.breaker.inflight_requests.limit Relates elastic#16011

With this commit we limit the size of all in-flight requests on HTTP level. The size is guarded by the same circuit breaker that is also used on transport level. Similarly, the size that is used is HTTP content length. Relates elastic#16011

Closes #17133

In #17133 we introduce request size limit handling and need a custom channel implementation. In order to ensure we delegate all methods it is better to have this channel implement an interface instead of an abstract base class (so changes on the interface turn into compile errors). Relates #17133

jasontedor · 2016-05-24T21:09:58Z

core/src/main/java/org/elasticsearch/transport/netty/NettyTransportChannel.java

@@ -134,6 +141,14 @@ public void sendResponse(Throwable error) throws IOException {
        transportServiceAdapter.onResponseSent(requestId, action, error);
    }

+    private void close() {


@danielmitterdorfer Why is this method called close? It might make more sense if it were in the finally blocks but as-is I'm confused by it. Help?

In my original implementation it was called indeed in the finally block, but Boaz and I had a lengthy discussion about this one. It makes sense to free the reserved bytes at the beginning of the method as we're done with request processing at this point in time. Freeing at the beginning guards against any problems that might occur further down the line (admittedly this could also be solved in a finally block).

The name close() just implies that you cannot reuse the channel after this method has been called and the precondition check guards against this.

I think it should be in the finally block, regardless of the name. We asynchronously write, the bytes are still held until after the write. 😄

If you've already had that discussion and concluded that at the beginning of the method is correct (and unswayed by my thoughts here), can we at least change the name? The semantics are just counterintuitive and a name change would at least improve that. Sorry, I do not have any good suggestions. 😦

We asynchronously write, the bytes are still held until after the write.

I don't follow this one? this is about freeing the bytes of the request not the bytes of the response? it's all just a heuristic anyway.

I'm good with doing it in a finally clause, though I think close() indicates well that the channel is closed and shouldn't be used again. If we agree on that I think that closing at the beginning and throwing an early exception if already closed is what we want?

I don't follow this one? this is about freeing the bytes of the request not the bytes of the response?

Yes, I'm sorry, I confused myself and brought something irrelevant into this discussion.

I'm good with doing it in a finally clause, though I think close() indicates well that the channel is closed and shouldn't be used again.

That's why I don't like it, we are "closing" and then still proceeding to perform operations on the channel. I don't like the semantics. Can I suggest release or unreserve?

well, we don't really do ops on the NettyTransportChannel but rather it's underlying fields which is similar in my mind to any clean up on close code. But I'm with you if you want to change this - I personally find release/unreserve even more confusing :)

we are "closing" and then still proceeding to perform operations on the channel.

That's a pure implementation detail. The caller cannot observe this. It just observes that after the response is sent once, it cannot perform any "write" operation (sending a response) again.

Can I suggest release or unreserve?

From the perspective of the channel it is still "closed", but if we really have to do this, then I'd more lean towards release.

That's a pure implementation detail.

That's exactly what we are discussing here though? The semantics of this implementation.

From the perspective of the channel it is still "closed", but if we really have to do this, then I'd more lean towards release.

This was never an issue though? It was added as part of this PR to release the tracked bytes (also added in this PR).

I still think that close() is fine as name but I am ok if you want to change it.

Thanks @danielmitterdorfer, I pushed 86f1bed.

danielmitterdorfer added >enhancement review resiliency v5.0.0-alpha1 labels Mar 16, 2016

dakrone self-assigned this Mar 16, 2016

Dieken reviewed Mar 16, 2016
View reviewed changes

dakrone reviewed Mar 17, 2016
View reviewed changes

bleskes reviewed Apr 11, 2016
View reviewed changes

danielmitterdorfer added 2 commits April 13, 2016 09:54

Limit request size on HTTP level

117bc68

With this commit we limit the size of all in-flight requests on HTTP level. The size is guarded by the same circuit breaker that is also used on transport level. Similarly, the size that is used is HTTP content length. Relates elastic#16011

danielmitterdorfer unassigned bleskes Apr 13, 2016

danielmitterdorfer removed the review label Apr 13, 2016

danielmitterdorfer force-pushed the bulk-size-limit branch from be7fa7d to 117bc68 Compare April 13, 2016 08:41

danielmitterdorfer merged commit 117bc68 into elastic:master Apr 13, 2016

danielmitterdorfer added a commit that referenced this pull request Apr 13, 2016

Merge remote-tracking branch 'danielmitterdorfer/bulk-size-limit'

0c7795f

Closes #17133

clintongormley added :Distributed Coordination/Network Http and internode communication implementations :Core/Infra/REST API REST infrastructure and utilities and removed :Bulk labels Apr 13, 2016

clintongormley mentioned this pull request Apr 13, 2016

Add safeguards to prevent simple user errors #11511

Closed

24 tasks

danielmitterdorfer mentioned this pull request Apr 13, 2016

Adjust line-length of transport related classes to coding standard #17730

Merged

danielmitterdorfer added the v2.4.0 label Apr 14, 2016

danielmitterdorfer mentioned this pull request Apr 14, 2016

Fix funny generics in request size limit test #17737

Merged

danielmitterdorfer mentioned this pull request Apr 25, 2016

Exclude specific transport actions from request size limit check #17951

Merged

dakrone mentioned this pull request Apr 25, 2016

indices.breaker.request.limit not increasing trip count #17952

Closed

clintongormley mentioned this pull request May 10, 2016

Elasticsearch nodes run into OOM during sustained ThreadPoolRejections #18230

Closed

jasontedor reviewed May 24, 2016
View reviewed changes

danielmitterdorfer deleted the bulk-size-limit branch June 15, 2016 13:19

clintongormley added the :Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload label Jun 21, 2016

clintongormley mentioned this pull request Sep 8, 2016

Client Node Request/Fetch Circuit Breaker #11401

Closed

clintongormley mentioned this pull request Nov 26, 2016

Stuck shard causing client nodes to run out of memory #8677

Closed


		private final Limiter limiter;

		private final Map<Integer, Reservation> reservationsPerChannel = new ConcurrentHashMap<>();

Limit request size #17133

Limit request size #17133

Uh oh!

Conversation

danielmitterdorfer commented Mar 16, 2016

Uh oh!

danielmitterdorfer commented Mar 16, 2016

Uh oh!

danielmitterdorfer commented Mar 16, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dieken commented Mar 16, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielmitterdorfer commented Mar 17, 2016

Uh oh!

danielmitterdorfer commented Mar 17, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dakrone commented Mar 17, 2016

Uh oh!

danielmitterdorfer commented Mar 18, 2016

Uh oh!

danielmitterdorfer commented Mar 21, 2016

Uh oh!

danielmitterdorfer commented Apr 11, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bleskes commented Apr 11, 2016

Uh oh!

dakrone commented Apr 11, 2016

Uh oh!

dakrone commented Apr 11, 2016

Uh oh!

danielmitterdorfer commented Apr 12, 2016

Uh oh!

danielmitterdorfer commented Apr 12, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jasontedor May 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jasontedor May 25, 2016 •

edited

Loading