Skip to content

Configuration

nurkiewicz edited this page Nov 9, 2014 · 39 revisions

Hystrix uses Archaius for the default implementation of properties for configuration.

The documentation below is for the default HystrixPropertiesStrategy implementation that is used unless overridden using a plugin.

Each property has 4 levels of precedence:

1) Global default from code

This is the default if none of the following 3 are set.

It is shown as "Default Value" below.

2) Dynamic global default property

Default values can be changed globally using properties.

The property name is shown as "Default Property" below.

Example:

Default Property: hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds

3) Instance default from code

This allows defining an instance specific default in the code via the HystrixCommand constructor.

Sample code is shown as "How to Set Instance Default" below.

Example:

HystrixCommandProperties.Setter()
   .withExecutionIsolationThreadTimeoutInMilliseconds(int value)

This would be injected into a constructor of HystrixCommand similar to this:

    public HystrixCommandInstance(int id) {
        super(Setter.withGroupKey(HystrixCommandGroupKey.Factory.asKey("ExampleGroup"))
                .andCommandPropertiesDefaults(HystrixCommandProperties.Setter()
                        .withExecutionIsolationThreadTimeoutInMilliseconds(500)));
        this.id = id;
    }

4) Dynamic instance property

Instance specific values can be set dynamically and override the preceding 3 levels of defaults.

The property name is shown as "Instance Property" below.

Example:

Instance Property: hystrix.command.[HystrixCommandKey].execution.isolation.thread.timeoutInMilliseconds

The HystrixCommandKey portion of the property would be replaced with the HystrixCommandKey.name() value of whatever HystrixCommand is being targeted.

If the key was "SubscriberGetAccount" then the property name would be:

hystrix.command.SubscriberGetAccount.execution.isolation.thread.timeoutInMilliseconds

Command Properties

Properties for controlling HystrixCommand behavior:

### Execution

Properties that control how HystrixCommand.run() is executed.

execution.isolation.strategy

What isolation strategy HystrixCommand.run() will be executed with.

If THREAD then it will be executed on a separate thread and concurrent requests limited by the number of threads in the thread-pool.

If SEMAPHORE then it will be executed on the calling thread and concurrent requests limited by the semaphore count.

Thread or Semaphore

In most cases it is recommended to stick with the default which is to run commands using thread isolation.

Commands executed in threads have an extra layer of protection against latencies beyond what network timeouts can offer.

Generally the only time semaphore isolation should be used instead of thread is when the call is so high volume (hundreds per second per instance) that the overhead of separate threads is too high, and this typically only applies to non-network calls.

Netflix API has 100+ commands running in 40+ thread pools and only a handful of commands are not running in a thread - those that fetch metadata from an in-memory cache or that are facades to thread-isolated commands (see "Primary + Secondary with Fallback" pattern for more information on this).

(Click for larger view)

See how isolation works for more information about this decision.

Default Value: THREAD (see ExecutionIsolationStrategy.THREAD)
Possible Values: THREAD, SEMAPHORE
Default Property: hystrix.command.default.execution.isolation.strategy
Instance Property: hystrix.command.[HystrixCommandKey].execution.isolation.strategy
How to Set Instance Default:

// to use thread isolation
HystrixCommandProperties.Setter()
   .withExecutionIsolationStrategy(ExecutionIsolationStrategy.THREAD)
// to use semaphore isolation
HystrixCommandProperties.Setter()
   .withExecutionIsolationStrategy(ExecutionIsolationStrategy.SEMAPHORE)

execution.isolation.thread.timeoutInMilliseconds

Time in milliseconds after which the calling thread will timeout and walk away from the HystrixCommand.run() execution and mark the HystrixCommand as a TIMEOUT and perform fallback logic. This applies when ExecutionIsolationStrategy.THREAD is used.

Default Value: 1000
Default Property: hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds
Instance Property: hystrix.command.[HystrixCommandKey].execution.isolation.thread.timeoutInMilliseconds
How to Set Instance Default:

HystrixCommandProperties.Setter()
   .withExecutionIsolationThreadTimeoutInMilliseconds(int value)

execution.isolation.thread.interruptOnTimeout

Whether the thread executing HystrixCommand.run() should be interrupted when timeout occurs.

Default Value: true
Default Property: hystrix.command.default.execution.isolation.thread.interruptOnTimeout
Instance Property: hystrix.command.[HystrixCommandKey].execution.isolation.thread.interruptOnTimeout
How to Set Instance Default:

HystrixCommandProperties.Setter()
   .withExecutionIsolationThreadInterruptOnTimeout(boolean value)

execution.isolation.semaphore.maxConcurrentRequests

Max number of requests allowed to a HystrixCommand.run() method when ExecutionIsolationStrategy.SEMAPHORE is used.

If the max concurrent limit is hit then subsequent requests will be rejected.

The logic for sizing a semaphore is basically the same as choosing how many threads in a thread-pool but the overhead for a semaphore is far smaller and typically the executions are far faster (sub-millisecond) otherwise threads would be used.

For example, 5000rps on a single instance for in-memory lookups with metrics being gathered has been seen to work with a semaphore of only 2.

The isolation principle is still the same so the semaphore should still be a small percentage of the overall container (ie Tomcat) threadpool, not all of or most of it, otherwise it provides no protection.

Default Value: 10
Default Property: hystrix.command.default.execution.isolation.semaphore.maxConcurrentRequests
Instance Property: hystrix.command.[HystrixCommandKey].execution.isolation.semaphore.maxConcurrentRequests
How to Set Instance Default:

HystrixCommandProperties.Setter()
   .withExecutionIsolationSemaphoreMaxConcurrentRequests(int value)
### Fallback

Properties that control how HystrixCommand.getFallback() is executed. These properties apply to both ExecutionIsolationStrategy.THREAD and ExecutionIsolationStrategy.SEMAPHORE.

fallback.isolation.semaphore.maxConcurrentRequests

Max number of requests allows to a HystrixCommand.getFallback() method from the calling thread.

If the max concurrent limit is hit then subsequent requests will be rejected and an exception thrown since no fallback could be retrieved.

Default Value: 10
Default Property: hystrix.command.default.fallback.isolation.semaphore.maxConcurrentRequests
Instance Property: hystrix.command.[HystrixCommandKey].fallback.isolation.semaphore.maxConcurrentRequests
How to Set Instance Default:

HystrixCommandProperties.Setter()
   .withFallbackIsolationSemaphoreMaxConcurrentRequests(int value)

fallback.enabled

Since: 1.2

Whether HystrixCommand.getFallback() will be attempted when failure or rejection occurs.

Default Value: true
Default Property: hystrix.command.default.fallback.enabled
Instance Property: hystrix.command.[HystrixCommandKey].fallback.enabled
How to Set Instance Default:

HystrixCommandProperties.Setter()
   .withFallbackEnabled(boolean value)
### Circuit Breaker

The circuit breaker properties control behavior of the HystrixCircuitBreaker.

circuitBreaker.enabled

Whether a circuit breaker will be used to track health and short-circuit requests if it trips.

Default Value: true
Default Property: hystrix.command.default.circuitBreaker.enabled
Instance Property: hystrix.command.[HystrixCommandKey].circuitBreaker.enabled
How to Set Instance Default:

HystrixCommandProperties.Setter()
   .withCircuitBreakerEnabled(boolean value)

circuitBreaker.requestVolumeThreshold

Minimum number of requests in rolling window needed before tripping the circuit will occur.

For example, if the value is 20, then if only 19 requests are received in the rolling window (say 10 seconds) the circuit will not trip open even if all 19 failed.

Default Value: 20
Default Property: hystrix.command.default.circuitBreaker.requestVolumeThreshold
Instance Property: hystrix.command.[HystrixCommandKey].circuitBreaker.requestVolumeThreshold
How to Set Instance Default:

HystrixCommandProperties.Setter()
   .withCircuitBreakerRequestVolumeThreshold(int value)

circuitBreaker.sleepWindowInMilliseconds

After tripping the circuit how long to reject requests before allowing attempts again to determine if the circuit should be closed.

Default Value: 5000
Default Property: hystrix.command.default.circuitBreaker.sleepWindowInMilliseconds
Instance Property: hystrix.command.[HystrixCommandKey].circuitBreaker.sleepWindowInMilliseconds
How to Set Instance Default:

HystrixCommandProperties.Setter()
   .withCircuitBreakerSleepWindowInMilliseconds(int value)

circuitBreaker.errorThresholdPercentage

Error percentage at which the circuit should trip open and start short-circuiting requests to fallback logic.

Default Value: 50
Default Property: hystrix.command.default.circuitBreaker.errorThresholdPercentage
Instance Property: hystrix.command.[HystrixCommandKey].circuitBreaker.errorThresholdPercentage
How to Set Instance Default:

HystrixCommandProperties.Setter()
   .withCircuitBreakerErrorThresholdPercentage(int value)

circuitBreaker.forceOpen

If true the circuit breaker will be forced open (tripped) and reject all requests.

This property takes precedence over circuitBreaker.forceClosed.

Default Value: false
Default Property: hystrix.command.default.circuitBreaker.forceOpen
Instance Property: hystrix.command.[HystrixCommandKey].circuitBreaker.forceOpen
How to Set Instance Default:

HystrixCommandProperties.Setter()
   .withCircuitBreakerForceOpen(boolean value)

circuitBreaker.forceClosed

If true the circuit breaker will remain closed and allow requests regardless of the error percentage.

The circuitBreaker.forceOpen property takes precedence so if it set to true this property does nothing.

Default Value: false
Default Property: hystrix.command.default.circuitBreaker.forceClosed
Instance Property: hystrix.command.[HystrixCommandKey].circuitBreaker.forceClosed
How to Set Instance Default:

HystrixCommandProperties.Setter()
   .withCircuitBreakerForceClosed(boolean value)
### Metrics

Properties related to capturing metrics from HystrixCommand execution.

metrics.rollingStats.timeInMilliseconds

Duration of statistical rolling window in milliseconds. This is how long metrics are kept for the circuit breaker to use and for publishing.

The window is broken into buckets and "roll" by those increments.

For example, if set at 10 seconds (10000) with 10 1-second buckets, this following diagram represents how it rolls new buckets on and old ones off:

Default Value: 10000
Default Property: hystrix.command.default.metrics.rollingStats.timeInMilliseconds
Instance Property: hystrix.command.[HystrixCommandKey].metrics.rollingStats.timeInMilliseconds
How to Set Instance Default:

HystrixCommandProperties.Setter()
   .withMetricsRollingStatisticalWindowInMilliseconds(int value)

metrics.rollingStats.numBuckets

Number of buckets the rolling statistical window is broken into.

Note: The following must be true "metrics.rollingStats.timeInMilliseconds % metrics.rollingStats.numBuckets == 0" otherwise it will throw an exception.

In other words, 10000/10 is okay, so is 10000/20 but 10000/7 is not.

Default Value: 10
Possible Values: Any value that metrics.rollingStats.timeInMilliseconds can be evenly divided by. The result however should be buckets measuring 100s or 1000s of milliseconds. Performance at high volume has not been tested with buckets <100ms.
Default Property: hystrix.command.default.metrics.rollingStats.numBuckets
Instance Property: hystrix.command.[HystrixCommandKey].metrics.rollingStats.numBuckets
How to Set Instance Default:

HystrixCommandProperties.Setter()
   .withMetricsRollingStatisticalWindowBuckets(int value)

metrics.rollingPercentile.enabled

Whether execution latencies should be tracked and calculated as percentiles.

Default Value: true
Default Property: hystrix.command.default.metrics.rollingPercentile.enabled
Instance Property: hystrix.command.[HystrixCommandKey].metrics.rollingPercentile.enabled
How to Set Instance Default:

HystrixCommandProperties.Setter()
   .withMetricsRollingPercentileEnabled(boolean value)

metrics.rollingPercentile.timeInMilliseconds

Duration of rolling window in milliseconds that execution times are kept for to allow percentile calculations.

The window is broken into buckets and "roll" by those increments.

Default Value: 60000
Default Property: hystrix.command.default.metrics.rollingPercentile.timeInMilliseconds
Instance Property: hystrix.command.[HystrixCommandKey].metrics.rollingPercentile.timeInMilliseconds
How to Set Instance Default:

HystrixCommandProperties.Setter()
   .withMetricsRollingPercentileWindowInMilliseconds(int value)

metrics.rollingPercentile.numBuckets

Number of buckets the rollingPercentile window will be broken into.

Note: The following must be true "metrics.rollingPercentile.timeInMilliseconds % metrics.rollingPercentile.numBuckets == 0" otherwise it will throw an exception.

In other words, 60000/6 is okay, so is 60000/60 but 10000/7 is not.

Default Value: 6
Possible Values: Any value that metrics.rollingPercentile.timeInMilliseconds can be evenly divided by. The result however should be buckets measuring 1000s of milliseconds. Performance at high volume has not been tested with buckets <1000ms.
Default Property: hystrix.command.default.metrics.rollingPercentile.numBuckets
Instance Property: hystrix.command.[HystrixCommandKey].metrics.rollingPercentile.numBuckets
How to Set Instance Default:

HystrixCommandProperties.Setter()
   .withMetricsRollingPercentileWindowBuckets(int value)

metrics.rollingPercentile.bucketSize

Max number of execution times are kept per bucket. If more executions occur during the time they will loop and start over-writing at the beginning of the bucket.

For example, if bucket size is set to 100 and represents 10 seconds and 500 executions occur during this time only the last 100 executions will be kept for that 10 second bucket.

Increasing this size increases the amount of memory used to store values and increases time for sorting the lists to do percentile calculations.

Default Value: 100
Default Property: hystrix.command.default.metrics.rollingPercentile.bucketSize
Instance Property: hystrix.command.[HystrixCommandKey].metrics.rollingPercentile.bucketSize
How to Set Instance Default:

HystrixCommandProperties.Setter()
   .withMetricsRollingPercentileBucketSize(int value)

metrics.healthSnapshot.intervalInMilliseconds

Time in milliseconds to wait between allowing health snapshots to be taken that calculate success and error percentages and affect circuit breaker status.

On high-volume circuits the continual calculation of error percentage can become CPU intensive thus this controls how often it is calculated.

Default Value: 500
Default Property: hystrix.command.default.metrics.healthSnapshot.intervalInMilliseconds
Instance Property: hystrix.command.[HystrixCommandKey].metrics.healthSnapshot.intervalInMilliseconds
How to Set Instance Default:

HystrixCommandProperties.Setter()
   .withMetricsHealthSnapshotIntervalInMilliseconds(int value)
### Request Context

Properties related to HystrixRequestContext functionality used by HystrixCommand.

requestCache.enabled

Whether HystrixCommand.getCacheKey() should be used with HystrixRequestCache to provide de-duplication functionality via request-scoped caching.

Default Value: true
Default Property: hystrix.command.default.requestCache.enabled
Instance Property: hystrix.command.[HystrixCommandKey].requestCache.enabled
How to Set Instance Default:

HystrixCommandProperties.Setter()
   .withRequestCacheEnabled(boolean value)

requestLog.enabled

Whether HystrixCommand execution and events should be logged to HystrixRequestLog.

Default Value: true
Default Property: hystrix.command.default.requestLog.enabled
Instance Property: hystrix.command.[HystrixCommandKey].requestLog.enabled
How to Set Instance Default:

HystrixCommandProperties.Setter()
   .withRequestLogEnabled(boolean value)
## Collapser Properties

Properties for controlling HystrixCollapser behavior.

maxRequestsInBatch

The maximum number of requests allowed in a batch before triggering a batch execution.

Default Value: Integer.MAX_VALUE
Default Property: hystrix.collapser.default.maxRequestsInBatch
Instance Property: hystrix.collapser.[HystrixCollapserKey].maxRequestsInBatch
How to Set Instance Default:

HystrixCollapserProperties.Setter()
   .withMaxRequestsInBatch(int value)

timerDelayInMilliseconds

Number of items in batch that will trigger a batch execution before timerDelayInMilliseconds triggers the batch.

Default Value: 10
Default Property: hystrix.collapser.default.timerDelayInMilliseconds
Instance Property: hystrix.collapser.[HystrixCollapserKey].timerDelayInMilliseconds
How to Set Instance Default:

HystrixCollapserProperties.Setter()
   .withTimerDelayInMilliseconds(int value)

requestCache.enabled

Whether request caching is enabled for HystrixCollapser.execute() and HystrixCollapser.queue() invocations.

Default Value: true
Default Property: hystrix.collapser.default.requestCache.enabled
Instance Property: hystrix.collapser.[HystrixCollapserKey].requestCache.enabled
How to Set Instance Default:

HystrixCollapserProperties.Setter()
   .withRequestCacheEnabled(boolean value)
## ThreadPool Properties

Properties for controlling thread-pool behavior that HystrixCommands execute on.

Most of the time the default value of 10 threads will be fine (often it could be made smaller).

To determine if it needs to be larger, a basic formula for calculating the size is:

requests per second at peak when healthy * 99th percentile latency in seconds + some breathing room

See the example below to see this formula put into practice.

The general principle is keep the pool as small as possible as it is the primary tool to shed load and prevent resources from becoming blocked if latency occurs.

Netflix API has 30+ of its threadpools set at 10, 2 at 20 and 1 at 25.

(Click for larger view)

The above diagram shows an example configuration where the dependency has no reason to hit the 99.5th percentile and thus cuts it short at the network timeout layer and immediately retries with the expectation to get median latency most of the time, and accomplish this all within the 300ms thread timeout.

If the dependency has legitimate reasons to sometimes hit the 99.5th percentile (i.e. cache miss with lazy generation) then the network timeout will be set higher than it, such as at 325ms with 0 or 1 retries and the thread timeout set higher (350ms+).

The threadpool is sized at 10 to handle a burst of 99th percentile requests, but when everything is healthy this threadpool will typically only have 1 or 2 threads active at any given time to serve mostly 40ms median calls.

When configured correctly a timeout at the HystrixCommand layer should be rare, but the protection is there in case something other than network latency affects the time, or the combination of connect+read+retry+connect+read in a worst case scenario still exceeds the configured overall timeout.

The aggressiveness of configurations and tradeoffs in each direction are different for each dependency.

Configurations can be changed in realtime as needed as performance characteristics change or when problems are found all without risking the taking down of the entire app if problems or misconfigurations occur.

coreSize

Core thread-pool size. This is the maximum number of concurrent HystrixCommands that can be executed.

Default Value: 10
Default Property: hystrix.threadpool.default.coreSize
Instance Property: hystrix.threadpool.[HystrixThreadPoolKey].coreSize
How to Set Instance Default:

HystrixThreadPoolProperties.Setter()
   .withCoreSize(int value)

maxQueueSize

Max queue size of BlockingQueue implementation.

If set to -1 then SynchronousQueue will be used, otherwise a positive value will be used with LinkedBlockingQueue.

NOTE: This property only applies at initialization time since queue implementations can't be resized or changed without re-initializing the thread executor which is not supported.

To overcome this and allow dynamic changes in queue see the queueSizeRejectionThreshold property.

Changing between SynchronousQueue and LinkedBlockingQueue requires a restart.

Default Value: -1
Default Property: hystrix.threadpool.default.maxQueueSize
Instance Property: hystrix.threadpool.[HystrixThreadPoolKey].maxQueueSize
How to Set Instance Default:

HystrixThreadPoolProperties.Setter()
   .withMaxQueueSize(int value)

queueSizeRejectionThreshold

Queue size rejection threshold is an artificial "max" size at which rejections will occur even if maxQueueSize has not been reached. This is done because the maxQueueSize of a BlockingQueue can not be dynamically changed and we want to support dynamically changing the queue size that affects rejections.

This is used by HystrixCommand when queuing a thread for execution.

NOTE: This property is not applicable if maxQueueSize == -1.

Default Value: 5
Default Property: hystrix.threadpool.default.queueSizeRejectionThreshold
Instance Property: hystrix.threadpool.[HystrixThreadPoolKey].queueSizeRejectionThreshold
How to Set Instance Default:

HystrixThreadPoolProperties.Setter()
   .withQueueSizeRejectionThreshold(int value)

keepAliveTimeMinutes

Keep-alive time in minutes.

This is in practice not used since the corePoolSize and maxPoolSize are set to the same value in the default implementation, but if a custom implementation were used via plugin then this would be available to use.

Default Value: 1
Default Property: hystrix.threadpool.default.keepAliveTimeMinutes
Instance Property: hystrix.threadpool.[HystrixThreadPoolKey].keepAliveTimeMinutes
How to Set Instance Default:

HystrixThreadPoolProperties.Setter()
   .withKeepAliveTimeMinutes(int value)

metrics.rollingStats.timeInMilliseconds

Duration of statistical rolling window in milliseconds. This is how long metrics are kept for the thread pool.

The window is broken into buckets and "roll" by those increments.

Default Value: 10000
Default Property: hystrix.threadpool.default.metrics.rollingStats.timeInMilliseconds
Instance Property: hystrix.threadpool.[HystrixThreadPoolKey].metrics.rollingStats.timeInMilliseconds
How to Set Instance Default:

HystrixThreadPoolProperties.Setter()
   .withMetricsRollingStatisticalWindowInMilliseconds(int value)

metrics.rollingStats.numBuckets

Number of buckets the rolling statistical window is broken into.

Note: The following must be true "metrics.rollingStats.timeInMilliseconds % metrics.rollingStats.numBuckets == 0" otherwise it will throw an exception.

In other words, 10000/10 is okay, so is 10000/20 but 10000/7 is not.

Default Value: 10
Possible Values: Any value that metrics.rollingStats.timeInMilliseconds can be evenly divided by. The result however should be buckets measuring 100s or 1000s of milliseconds. Performance at high volume has not been tested with buckets <100ms.
Default Property: hystrix.threadpool.default.metrics.rollingPercentile.numBuckets
Instance Property: hystrix.threadpool.[HystrixThreadPoolProperties].metrics.rollingPercentile.numBuckets
How to Set Instance Default:

HystrixThreadPoolProperties.Setter()
   .withMetricsRollingStatisticalWindowBuckets(int value)