Skip to content

Circuit break the number of inline scripts compiled per minute #19694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 9, 2016

Conversation

dakrone
Copy link
Member

@dakrone dakrone commented Jul 29, 2016

When compiling many dynamically changing scripts, parameterized
scripts (https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-using.html#prefer-params)
should be preferred. This enforces a limit to the number of scripts that
can be compiled within a minute. A new dynamic setting is added -
script.max_compilations_per_minute, which defaults to 15.

If more dynamic scripts are sent, a user will get the following
exception:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "circuit_breaking_exception",
        "reason" : "[script] Too many dynamic script compilations within one minute, max: [30/min]; please use on-disk, indexed, or scripts with parameters instead",
        "bytes_wanted" : 0,
        "bytes_limit" : 0
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "i",
        "node" : "a5V1eXcZRYiIk8lecjZ4Jw",
        "reason" : {
          "type" : "general_script_exception",
          "reason" : "Failed to compile inline script [\"aaaaaaaaaaaaaaaa\"] using lang [painless]",
          "caused_by" : {
            "type" : "circuit_breaking_exception",
            "reason" : "[script] Too many dynamic script compilations within one minute, max: [30/min]; please use on-disk, indexed, or scripts with parameters instead",
            "bytes_wanted" : 0,
            "bytes_limit" : 0
          }
        }
      }
    ],
    "caused_by" : {
      "type" : "general_script_exception",
      "reason" : "Failed to compile inline script [\"aaaaaaaaaaaaaaaa\"] using lang [painless]",
      "caused_by" : {
        "type" : "circuit_breaking_exception",
        "reason" : "[script] Too many dynamic script compilations within one minute, max: [30/min]; please use on-disk, indexed, or scripts with parameters instead",
        "bytes_wanted" : 0,
        "bytes_limit" : 0
      }
    }
  },
  "status" : 500
}

This also fixes a bug in ScriptService where requests being executed
concurrently on a single node could cause a script to be compiled
multiple times (many in the case of a powerful node with many shards)
due to no synchronization between checking the cache and compiling the
script. There is now synchronization so that a script being compiled
will only be compiled once regardless of the number of concurrent
searches on a node.

Relates to #19396

@dakrone dakrone added >enhancement :Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload v5.0.0-beta1 labels Jul 29, 2016
@dakrone
Copy link
Member Author

dakrone commented Jul 29, 2016

As something to think about for this:

  • Is 30 the right number? It works out to one every 2 seconds, but maybe we want to be more or less strict?
  • Should this be as strict as throwing an exception? If we want to be more lenient this can be logging a warning instead (easy to change it to do that)

@nik9000
Copy link
Member

nik9000 commented Jul 29, 2016

I think I'd prefer a warning. I'm fairly sure I'd have noticed it when running Elasticsearch.

* Check whether there have been too many compilations within the last minute, throwing a circuit breaking exception if so.
*/
public void checkCompilationLimit() {
long now = System.nanoTime();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be useful to describe the how here I think. I guess this works like a bucket of water slowly filling up. You get to compile a script if there is enough water in the bucket, otherwise you don't. If the bucket overflows then you don't get more water.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a variant of the token bucket algorithm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that link'd be a lovely comment.

@clintongormley
Copy link
Contributor

I think people won't notice it unless it is an exception, just like all the other safeguards. The immediate workaround for the user is to update a dynamic setting, giving them time to update their application.

I think the default limit should be lower so that a user is more likely to trip this safeguard in dev instead of production.

@dakrone
Copy link
Member Author

dakrone commented Aug 2, 2016

I think the default limit should be lower so that a user is more likely to trip this safeguard in dev instead of production.

Okay, I originally went with 10 a minute (before going for 30/min for this PR), does that sound reasonable or too low/high?

@clintongormley
Copy link
Contributor

So one compilation every 6 seconds. That seems OK, even during script development. Maybe a tiny bit tight. 15 would definitely be ok though

@dakrone
Copy link
Member Author

dakrone commented Aug 3, 2016

Okay, I pushed a new commit that sets the limit to 15/min

*
* It can be thought of as a bucket with water, every time the bucket is checked, water is added proportional to the amount of time that
* elapsed since the last time it was checked. If there is enough water, some is removed and the request is allowed. If there is not
* enough water the request is denied.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like: The bucket can "overflow", in which case the excess capacity is just "spilled", rather than saved up. So it never holds more than a minute's capacity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, I will add that

@nik9000
Copy link
Member

nik9000 commented Aug 9, 2016

I looked at this again with new eyes this morning and left a few comments, mostly about funky side cases like "what if I set this to Long.MAX_VALUE?", "What if I set this to 0?" and "What if I set this to -123?".

@dakrone
Copy link
Member Author

dakrone commented Aug 9, 2016

@nik9000 okay I pushed a bunch of commits that I tried to break into meaningful pieces with nice commit messages, addressing your concerns :)

scriptService.setMaxCompilationsPerMinute(0);
expectThrows(CircuitBreakingException.class, () -> scriptService.checkCompilationLimit());
scriptService.setMaxCompilationsPerMinute(Integer.MAX_VALUE);
for (int i = 0; i < randomIntBetween(1000, 10000); i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to save the value of the limit rather than do it every iteration?

@nik9000
Copy link
Member

nik9000 commented Aug 9, 2016

Left a small question/suggestion. Otherwise LGTM. Feel free to address my last point however you like and merge when you have a chance.

When compiling many dynamically changing scripts, parameterized
scripts (<https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-using.html#prefer-params>)
should be preferred. This enforces a limit to the number of scripts that
can be compiled within a minute. A new dynamic setting is added -
`script.max_compilations_per_minute`, which defaults to 15.

If more dynamic scripts are sent, a user will get the following
exception:

```json
{
  "error" : {
    "root_cause" : [
      {
        "type" : "circuit_breaking_exception",
        "reason" : "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead",
        "bytes_wanted" : 0,
        "bytes_limit" : 0
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "i",
        "node" : "a5V1eXcZRYiIk8lecjZ4Jw",
        "reason" : {
          "type" : "general_script_exception",
          "reason" : "Failed to compile inline script [\"aaaaaaaaaaaaaaaa\"] using lang [painless]",
          "caused_by" : {
            "type" : "circuit_breaking_exception",
            "reason" : "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead",
            "bytes_wanted" : 0,
            "bytes_limit" : 0
          }
        }
      }
    ],
    "caused_by" : {
      "type" : "general_script_exception",
      "reason" : "Failed to compile inline script [\"aaaaaaaaaaaaaaaa\"] using lang [painless]",
      "caused_by" : {
        "type" : "circuit_breaking_exception",
        "reason" : "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead",
        "bytes_wanted" : 0,
        "bytes_limit" : 0
      }
    }
  },
  "status" : 500
}
```

This also fixes a bug in `ScriptService` where requests being executed
concurrently on a single node could cause a script to be compiled
multiple times (many in the case of a powerful node with many shards)
due to no synchronization between checking the cache and compiling the
script. There is now synchronization so that a script being compiled
will only be compiled once regardless of the number of concurrent
searches on a node.

Relates to elastic#19396
@dakrone dakrone deleted the compliation-breaker branch August 16, 2016 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload >enhancement v5.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants