Skip to content

Limit request size #17133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 13, 2016

Conversation

danielmitterdorfer
Copy link
Member

With this PR we add the possibility to limit the size of all in-flight requests on transport and HTTP level. The limit is guarded by a new in-flight request circuit breaker. It can be configured by network.breaker.inflight_requests.limit which is set to 100% of the available JVM heap per default. In practice, this means that this circuit breaker will only break if the parent circuit breaker breaks (this one is not new - but for completeness - the parent circuit breaker is configured by indices.breaker.total.limit with a default value of 70% of totally available JVM heap).

Closes #16011

@danielmitterdorfer
Copy link
Member Author

For the review I am specifically interested in three things:

  1. Should configuration properties a standard naming scheme? Currently I have adhered to the naming scheme in the respective implementation classes where the properties are defined. Note that for the circuit breaker based properties we also have the properties transport.breaker.request.overhead and http.breaker.request.overhead.
  2. Are the default values ok? I have basically derived the values based on the fact that http.max_content_length is already 100 MB by default and it would not make sense to put a lower default limit on all in-flight requests.
  3. Should we adjust the default value indices.breaker.total.limit for the parent circuit breaker given that we have now introduced two new child breakers?

@danielmitterdorfer
Copy link
Member Author

@dakrone: Do you have time to review this PR?

@dakrone dakrone self-assigned this Mar 16, 2016
// on some occasions, we will get 0 (e.g. when the message is chunked and we get the first chunk)
if (reservation > 0) {
logger.debug("Increase amount of in-flight requests by [{}] bytes on channel id [{}]", reservation, channelId);
assert reservationsPerChannel.containsKey(channelId) == false : "Should not contain multiple reservations per channel";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is problematic, if HTTP pipe-lining is enabled, handleUpstream() will be called multiple times but handleDownstream() may not get called because the response isn't ready yet, then the assertion will fail or the entry in reservationsPerChannel is overwritten if JVM assertion is disabled.

See class org.elasticsearch.http.netty.pipelining.HttpPipeliningHandler, the processing of HTTP requests is out of order, just the responses are sent in order, so it's possible multiple requests on same channel are in-flight.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good catch. I have changed the implementation now to handle also pipelined requests.

@Dieken
Copy link

Dieken commented Mar 16, 2016

Maybe name the in-flight requests related settings to ".....requests....." or even "......inflight_requests....".


private final Limiter limiter;

private final Map<Integer, Reservation> reservationsPerChannel = new ConcurrentHashMap<>();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is AttributeKey (http://netty.io/4.0/api/io/netty/channel/ChannelHandler.html) cheaper than ConcurrentHashMap?

The booking in my patch Dieken@f2d487e is cheaper but it sits in higher level code.

Maybe introduce similar trick like OrderedUpstreamMessageEvent and OrderedDownstreamMessageEvent, let handleDownstream() know the reservation from its argument "e".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding AttributeKey: I would rather rely on JDK classes as long as others don't provide a significant improvement. You're right that using LongAdder is cheaper but on a higher level you already get request / response correlation basically for free.

The idea to piggyback on the message event is nice but I'd also avoid tying reservations together with the events.

@danielmitterdorfer
Copy link
Member Author

@Dieken: Thanks for your comments. I am currently working on another issue but I am having a closer look ASAP.

@danielmitterdorfer
Copy link
Member Author

Regarding the HTTP status code 413: Here is the relevant part from RFC 2616, section 10

10.4.14 413 Request Entity Too Large

The server is refusing to process a request because the request entity is larger than the server is willing or able to process. The server MAY close the connection to prevent the client from continuing the request.

If the condition is temporary, the server SHOULD include a Retry- After header field to indicate that it is temporary and after what time the client MAY try again.

So we can add an Retry-After header (see also RFC 2616, section 14) with a relative delay of - say - 10 seconds to indicate that this condition may be temporary. Note that the condition is only temporary in case of multiple in-flight requests. If a single request hits the limit, it is still permanent (until somebody changes the setting of course). It may be even possible to detect these two cases but I am not sure it is worth the effort and additional code complexity.


@Override
public void reset() throws IOException {
addBytesChecked(-bytesWritten);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does reset() actually "free" the bytes that have been written so far?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this does not free the bytes but I still think it is necessary to reduce the number of reserved bytes. Consider this scenario:

LimitingStreamOutput stream = new LimitingStreamOutput(new BytesStreamOutput());
stream.writeByte((byte) 0);
stream.writeByte((byte) 0);
stream.writeByte((byte) 0); // at this point we've written (and reserved) 3 bytes
stream.reset();  // to my understanding we should now reset the number of reserved bytes to 0
stream.writeByte((byte) 0); // reserve 1 byte
stream.close();  // return all reserved bytes

Wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the implementation:

    @Override
    public void reset() {
        // shrink list of pages
        if (bytes.size() > BigArrays.PAGE_SIZE_IN_BYTES) {
            bytes = bigarrays.resize(bytes, BigArrays.PAGE_SIZE_IN_BYTES);
        }

        // go back to start
        count = 0;
    }

It doesn't necessarily "free" bytes, just resizes them back to a 16384 byte array if more than 16384 bytes have been written already, so I'm not sure what to do here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed I think both approaches are ok. If we're not reducing the amount in #reset() we're a bit more conservative (as we keep the count although there may not be more memory allocated) but when #close() is called we'll always free all written bytes, even if #reset() has been called in between. So I'll change the implementation and don't reduce the count in #reset() anymore.

@dakrone
Copy link
Member

dakrone commented Mar 17, 2016

@danielmitterdorfer left some comments on this! one question I do have is how much impact this will have on performance of HTTP and transport requests, do you have any idea how large the impact will be?

@danielmitterdorfer
Copy link
Member Author

Regarding your question on the performance impact: I'll run some benchmarks and post the results.

@danielmitterdorfer
Copy link
Member Author

I've microbenchmarked 3 scenarios with RequestSizeLimitHandler:

  1. Handle two upstream and downstream events
  2. Handle a upstream event, a pipelined upstream event, a pipelined downstream event, a downstream event
  3. Handle a pipeline upstream event, a pipelined downstream event, a pipelined upstream event, a pipelined downstream event

(I did this so we have the same number of operations in all cases)

I benchmarked also the current version of RequestSizeLimitHandler against the original one (without pipeline support).

A rough summary (exact number depends on the scenario):

Metric RequestSizeLimitHandler without pipeline support RequestSizeLimitHandler (with pipeline support)
Allocation rate 56 bytes / op 79 - 104 bytes / op (scenario dependent)
Average Time 82 - 113 ns / op 94 - 119 ns / op

Details can be found in the gist: https://gist.github.com/danielmitterdorfer/871cd62d7e16e45ae79d

It is way harder to microbenchmark the overhead of request size limiting on transport level as the new code is not that isolated as on HTTP level. However, I've run macrobenchmarks before and after the change to check for performance-related changes.

The baseline for the test was the last commit on master that I've merged (d17fd33). I've ran two trials with geonames data against a single node Elasticsearch cluster with default settings. The benchmark first bulk-indexes data via HTTP, does a force-merge, runs 20 warmup iterations of all queries and then 10 timed iterations of these queries.

Metric baseline (without size limiting) size limiting enabled
Mean Indexing Throughput [docs/s] 13930, 12271 12775, 13331
99% latency match_all query [ms] 9.79, 9.93 10.00, 6.29
99% latency uncached aggregation query [ms] 115.75, 134.14 139.19, 119.25

I also ran the same benchmark against a two node cluster with default settings to also exercise the transport code:

Metric baseline (without size limiting) size limiting enabled
Mean Indexing Throughput [docs/s] 11119, 11711 11725, 11715
99% latency match_all query [ms] 9.93, 6.29 8.31, 8.21
99% latency uncached aggregation query [ms] 134.14, 116.26 118.19, 115.67

So in both cases the changes hide in the run-to-run variance of the benchmarks.

@danielmitterdorfer
Copy link
Member Author

@bleskes Thanks for your review. I have now introduced constants for all Netty headers, added Javadocs and additional helper methods in Marker. It should now be quite obvious what is read w.r.t. to headers. I felt that it may also make sense to add additional methods in NettyHeaders that help with low-level header handling (like reading the marker bytes) but I did not do this in this PR. However, this might be worth doing in another small PR (later).

@@ -41,7 +41,8 @@

@Override
protected Object decode(ChannelHandlerContext ctx, Channel channel, ChannelBuffer buffer) throws Exception {
if (buffer.readableBytes() < 6) {
int sizeHeaderLength = NettyHeader.MARKER_BYTES_SIZE + NettyHeader.MESSAGE_LENGTH_SIZE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this final?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@bleskes
Copy link
Contributor

bleskes commented Apr 11, 2016

sounds good @danielmitterdorfer

@dakrone
Copy link
Member

dakrone commented Apr 11, 2016

We still have the open question of what should be a good default value for this breaker. I feel we have two options:

  1. Set it to some percentage of total node memory - the complicated thing here is that different nodes have different requirements. For example, coordinating only nodes can have a lot of inflight requests (this is their job) but data nodes may have different requirements.
  2. Don't set a max and allow the parent circuit breaker to trigger where the sum of request + whatever the other breakers (field data etc.) need reaches above an accepted limit.

@dakrone @clintongormley do you have any opinions here?

Hmm... I think setting it to 200mb sounds reasonable to me? These are nice dynamic settings, so it's easy to change at runtime and I'd rather err too low than too high.

@dakrone
Copy link
Member

dakrone commented Apr 11, 2016

@bleskes brought up that this is all requests, so 200mb is probably too small (and I agree), since the parent breaker limit is 70%, how about setting this to 60%?

@danielmitterdorfer
Copy link
Member Author

@dakrone: The suggestion with 60% is fine for me. If I hear no objections I change it accordingly and update the docs. /cc:@bleskes

@danielmitterdorfer
Copy link
Member Author

We have decided to not limit the respective circuit breaker but rely on the parent circuit breaker (which is currently set to 70%). I'll update code and docs accordingly and merge the PR afterwards.

With this commit we limit the size of all in-flight requests on
transport level. The size is guarded by a circuit breaker and is
based on the content size of each request.

By default we use 100% of available heap meaning that the parent
circuit breaker will limit the maximum available size. This value
can be changed by adjusting the setting

network.breaker.inflight_requests.limit

Relates elastic#16011
With this commit we limit the size of all in-flight requests on
HTTP level. The size is guarded by the same circuit breaker that
is also used on transport level. Similarly, the size that is used
is HTTP content length.

Relates elastic#16011
@danielmitterdorfer danielmitterdorfer merged commit 117bc68 into elastic:master Apr 13, 2016
@clintongormley clintongormley added :Distributed Coordination/Network Http and internode communication implementations :Core/Infra/REST API REST infrastructure and utilities and removed :Bulk labels Apr 13, 2016
danielmitterdorfer added a commit that referenced this pull request Apr 13, 2016
In #17133 we introduce request size limit handling and need a custom
channel implementation. In order to ensure we delegate all methods
it is better to have this channel implement an interface instead of
an abstract base class (so changes on the interface turn into
compile errors).

Relates #17133
@@ -134,6 +141,14 @@ public void sendResponse(Throwable error) throws IOException {
transportServiceAdapter.onResponseSent(requestId, action, error);
}

private void close() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielmitterdorfer Why is this method called close? It might make more sense if it were in the finally blocks but as-is I'm confused by it. Help?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my original implementation it was called indeed in the finally block, but Boaz and I had a lengthy discussion about this one. It makes sense to free the reserved bytes at the beginning of the method as we're done with request processing at this point in time. Freeing at the beginning guards against any problems that might occur further down the line (admittedly this could also be solved in a finally block).

The name close() just implies that you cannot reuse the channel after this method has been called and the precondition check guards against this.

Copy link
Member

@jasontedor jasontedor May 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be in the finally block, regardless of the name. We asynchronously write, the bytes are still held until after the write. 😄

If you've already had that discussion and concluded that at the beginning of the method is correct (and unswayed by my thoughts here), can we at least change the name? The semantics are just counterintuitive and a name change would at least improve that. Sorry, I do not have any good suggestions. 😦

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We asynchronously write, the bytes are still held until after the write.

I don't follow this one? this is about freeing the bytes of the request not the bytes of the response? it's all just a heuristic anyway.

I'm good with doing it in a finally clause, though I think close() indicates well that the channel is closed and shouldn't be used again. If we agree on that I think that closing at the beginning and throwing an early exception if already closed is what we want?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow this one? this is about freeing the bytes of the request not the bytes of the response?

Yes, I'm sorry, I confused myself and brought something irrelevant into this discussion.

I'm good with doing it in a finally clause, though I think close() indicates well that the channel is closed and shouldn't be used again.

That's why I don't like it, we are "closing" and then still proceeding to perform operations on the channel. I don't like the semantics. Can I suggest release or unreserve?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, we don't really do ops on the NettyTransportChannel but rather it's underlying fields which is similar in my mind to any clean up on close code. But I'm with you if you want to change this - I personally find release/unreserve even more confusing :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are "closing" and then still proceeding to perform operations on the channel.

That's a pure implementation detail. The caller cannot observe this. It just observes that after the response is sent once, it cannot perform any "write" operation (sending a response) again.

Can I suggest release or unreserve?

From the perspective of the channel it is still "closed", but if we really have to do this, then I'd more lean towards release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a pure implementation detail.

That's exactly what we are discussing here though? The semantics of this implementation.

From the perspective of the channel it is still "closed", but if we really have to do this, then I'd more lean towards release.

This was never an issue though? It was added as part of this PR to release the tracked bytes (also added in this PR).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that close() is fine as name but I am ok if you want to change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @danielmitterdorfer, I pushed 86f1bed.

@danielmitterdorfer danielmitterdorfer deleted the bulk-size-limit branch June 15, 2016 13:19
@clintongormley clintongormley added the :Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload label Jun 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload :Core/Infra/REST API REST infrastructure and utilities :Distributed Coordination/Network Http and internode communication implementations >enhancement resiliency v2.4.0 v5.0.0-alpha2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants