Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

health check: overriding health threshold for edge intervals #3174

Merged
merged 4 commits into from
Apr 30, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion include/envoy/upstream/health_checker.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,21 @@ namespace Upstream {

enum class HealthState { Unhealthy, Healthy };

enum class HealthTransition { Unchanged, Changed };
enum class HealthTransition {
/**
* Used when the health state of a host hasn't changed.
*/
Unchanged,
/**
* Used when the health state of a host has changed.
*/
Changed,
/**
* Used when the health check result differs from the health state of a host, but a change to the
* latter is delayed due to healthy/unhealthy threshold settings.
*/
ChangePending
};

/**
* Wraps active health checking of an upstream cluster.
Expand Down
23 changes: 19 additions & 4 deletions source/common/upstream/health_checker_base_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -57,14 +57,25 @@ std::chrono::milliseconds HealthCheckerImplBase::interval(HealthState state,
// refer to the HealthCheck API documentation for more details.
uint64_t base_time_ms;
if (cluster_.info()->stats().upstream_cx_total_.used()) {
// When healthy/unhealthy threshold is configured the health transition of a host will be
// delayed. In this situation Envoy should use the edge interval settings between health checks.
//
// Example scenario for an unhealthy host with healthy_threshold set to 3:
// - check fails, host is still unhealthy and next check happens after unhealthy_interval;
// - check succeeds, host is still unhealthy and next check happens after healthy_edge_interval;
// - check succeeds, host is still unhealthy and next check happens after healthy_edge_interval;
// - check succeeds, host is now healthy and next check happens after interval;
// - check succeeds, host is still healthy and next check happens after interval.
switch (state) {
case HealthState::Unhealthy:
base_time_ms = changed_state == HealthTransition::Changed ? unhealthy_edge_interval_.count()
: unhealthy_interval_.count();
base_time_ms = changed_state == HealthTransition::ChangePending
? unhealthy_edge_interval_.count()
: unhealthy_interval_.count();
break;
default:
base_time_ms = changed_state == HealthTransition::Changed ? healthy_edge_interval_.count()
: interval_.count();
base_time_ms = changed_state == HealthTransition::ChangePending
? healthy_edge_interval_.count()
: interval_.count();
break;
}
} else {
Expand Down Expand Up @@ -189,6 +200,8 @@ void HealthCheckerImplBase::ActiveHealthCheckSession::handleSuccess() {
host_->healthFlagClear(Host::HealthFlag::FAILED_ACTIVE_HC);
parent_.incHealthy();
changed_state = HealthTransition::Changed;
} else {
changed_state = HealthTransition::ChangePending;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specifically wanted to respect the former semantics of changed_state, when it was a boolean.

Changed is only set where before changed_state would be set to true. Given that all places that used to check for truth-ness of changed_state have been changed to changed_state == HealthTransition::Changed in a previous PR, truth-ness checks still hold:

envoy-dev@envoy-dev:~/src/envoy$ egrep -R '(== HealthTransition::Changed)' . 2>/dev/null
./source/common/upstream/upstream_impl.cc:        if (changed_state == HealthTransition::Changed) {
./source/common/upstream/health_checker_base_impl.cc:      base_time_ms = (changed_state == HealthTransition::Changed ||
./source/common/upstream/health_checker_base_impl.cc:      base_time_ms = (changed_state == HealthTransition::Changed ||
./source/common/upstream/cluster_manager_impl.cc:          if (changed_state == HealthTransition::Changed &&

Where changed_state would have been false, it's now either ChangePending or Unchanged. Therefore new state would only affect false-ness checks. Since there are no such checks, the addition of the new state shouldn't be a problem:

envoy-dev@envoy-dev:~/src/envoy$ egrep -R '(!= HealthTransition::Changed|== HealthTransition::Unchanged)' . 2>/dev/null
envoy-dev@envoy-dev:~/src/envoy$

With the above in mind, the semantics should still hold.

}
}

Expand All @@ -210,6 +223,8 @@ HealthTransition HealthCheckerImplBase::ActiveHealthCheckSession::setUnhealthy(F
host_->healthFlagSet(Host::HealthFlag::FAILED_ACTIVE_HC);
parent_.decHealthy();
changed_state = HealthTransition::Changed;
} else {
changed_state = HealthTransition::ChangePending;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same reasoning as above

}
}

Expand Down
3 changes: 3 additions & 0 deletions source/common/upstream/health_checker_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -610,6 +610,9 @@ std::ostream& operator<<(std::ostream& out, HealthTransition changed_state) {
case HealthTransition::Changed:
out << "Changed";
break;
case HealthTransition::ChangePending:
out << "ChangePending";
break;
}
return out;
}
Expand Down
Loading