Skip to content

Commit

Permalink
health check: overriding health threshold for edge intervals (#3174)
Browse files Browse the repository at this point in the history
The current implementation of `healthy_edge_interval` and
`unhealthy_edge_interval` will wait for the health threshold to be
reached before the interval is used.

That leads to situations like this:
- `healthy_threshold` is set to 3;
- host health state is currently `unhealthy`;
- host health check fails, next check happens after `unhealthy_interval`;
- host health check succeeds, next check happens after `interval`;
- host health check succeeds, next check happens after `interval`;
- host health check succeeds, next check happens after `healthy_edge_interval`;
- host health check succeeds, next check happens after `interval`.

The behavior above defeats the purpose of having an edge interval since
its goal is to detect health state changes faster whilst reducing the
burden of health checks on healthy hosts.

The intended behavior to achieve edge interval's purpose on the same scenario
would be:
- host health check fails, next check happens after `unhealthy_interval`;
- host health check succeeds, next check happens after `healthy_edge_interval`;
- host health check succeeds, next check happens after `healthy_edge_interval`;
- host health check succeeds, next check happens after `interval`;
- host health check succeeds, next check happens after `interval`.

This PR fixes health check interval overrides so that the latter
bahavior is achieved.

Fixes: #3173

Signed-off-by: Marcelo Juchem <juchem@gmail.com>
  • Loading branch information
juchem authored and mattklein123 committed Apr 30, 2018
1 parent d20c12d commit 3eebc91
Show file tree
Hide file tree
Showing 4 changed files with 304 additions and 36 deletions.
16 changes: 15 additions & 1 deletion include/envoy/upstream/health_checker.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,21 @@ namespace Upstream {

enum class HealthState { Unhealthy, Healthy };

enum class HealthTransition { Unchanged, Changed };
enum class HealthTransition {
/**
* Used when the health state of a host hasn't changed.
*/
Unchanged,
/**
* Used when the health state of a host has changed.
*/
Changed,
/**
* Used when the health check result differs from the health state of a host, but a change to the
* latter is delayed due to healthy/unhealthy threshold settings.
*/
ChangePending
};

/**
* Wraps active health checking of an upstream cluster.
Expand Down
23 changes: 19 additions & 4 deletions source/common/upstream/health_checker_base_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -57,14 +57,25 @@ std::chrono::milliseconds HealthCheckerImplBase::interval(HealthState state,
// refer to the HealthCheck API documentation for more details.
uint64_t base_time_ms;
if (cluster_.info()->stats().upstream_cx_total_.used()) {
// When healthy/unhealthy threshold is configured the health transition of a host will be
// delayed. In this situation Envoy should use the edge interval settings between health checks.
//
// Example scenario for an unhealthy host with healthy_threshold set to 3:
// - check fails, host is still unhealthy and next check happens after unhealthy_interval;
// - check succeeds, host is still unhealthy and next check happens after healthy_edge_interval;
// - check succeeds, host is still unhealthy and next check happens after healthy_edge_interval;
// - check succeeds, host is now healthy and next check happens after interval;
// - check succeeds, host is still healthy and next check happens after interval.
switch (state) {
case HealthState::Unhealthy:
base_time_ms = changed_state == HealthTransition::Changed ? unhealthy_edge_interval_.count()
: unhealthy_interval_.count();
base_time_ms = changed_state == HealthTransition::ChangePending
? unhealthy_edge_interval_.count()
: unhealthy_interval_.count();
break;
default:
base_time_ms = changed_state == HealthTransition::Changed ? healthy_edge_interval_.count()
: interval_.count();
base_time_ms = changed_state == HealthTransition::ChangePending
? healthy_edge_interval_.count()
: interval_.count();
break;
}
} else {
Expand Down Expand Up @@ -189,6 +200,8 @@ void HealthCheckerImplBase::ActiveHealthCheckSession::handleSuccess() {
host_->healthFlagClear(Host::HealthFlag::FAILED_ACTIVE_HC);
parent_.incHealthy();
changed_state = HealthTransition::Changed;
} else {
changed_state = HealthTransition::ChangePending;
}
}

Expand All @@ -210,6 +223,8 @@ HealthTransition HealthCheckerImplBase::ActiveHealthCheckSession::setUnhealthy(F
host_->healthFlagSet(Host::HealthFlag::FAILED_ACTIVE_HC);
parent_.decHealthy();
changed_state = HealthTransition::Changed;
} else {
changed_state = HealthTransition::ChangePending;
}
}

Expand Down
3 changes: 3 additions & 0 deletions source/common/upstream/health_checker_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -610,6 +610,9 @@ std::ostream& operator<<(std::ostream& out, HealthTransition changed_state) {
case HealthTransition::Changed:
out << "Changed";
break;
case HealthTransition::ChangePending:
out << "ChangePending";
break;
}
return out;
}
Expand Down
Loading

0 comments on commit 3eebc91

Please sign in to comment.