-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Outlier Detection: use gRPC status code for detecting failures #7942
Changes from 10 commits
9352c0b
1aa2211
b06bc2e
f0a2c06
e473294
dd768d1
107ae87
88955c7
8477a71
7453108
595dc55
d69314c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -145,6 +145,8 @@ Most configuration items, namely | |
types of errors, but :ref:`outlier_detection.enforcing_success_rate<envoy_api_field_cluster.OutlierDetection.enforcing_success_rate>` applies | ||
to externally originated errors only and :ref:`outlier_detection.enforcing_local_origin_success_rate<envoy_api_field_cluster.OutlierDetection.enforcing_local_origin_success_rate>` applies to locally originated errors only. | ||
|
||
For gRPC requests, the outlier detection will use the HTTP status mapped from the [grpc-status](https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md#responses) response header. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please also document the runtime flag here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
|
||
|
||
.. _arch_overview_outlier_detection_logging: | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -33,6 +33,7 @@ Version history | |
* listeners: added :ref:`continue_on_listener_filters_timeout <envoy_api_field_Listener.continue_on_listener_filters_timeout>` to configure whether a listener will still create a connection when listener filters time out. | ||
* listeners: added :ref:`HTTP inspector listener filter <config_listener_filters_http_inspector>`. | ||
* lua: extended `httpCall()` and `respond()` APIs to accept headers with entry values that can be a string or table of strings. | ||
* outlier_detector: added support for the grpc-status response header by mapping it to HTTP status. Guarded by envoy.reloadable_features.outlier_detection_support_for_grpc_status which defaults to true. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: can you ref-link back to the new gRPC sub-section in the arch overview docs that I asked you to create above? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
* performance: new buffer implementation enabled by default (to disable add "--use-libevent-buffers 1" to the command-line arguments when starting Envoy). | ||
* performance: stats symbol table implementation (disabled by default; to test it, add "--use-fake-symbol-table 0" to the command-line arguments when starting Envoy). | ||
* rbac: added support for DNS SAN as :ref:`principal_name <envoy_api_field_config.rbac.v2.Principal.Authenticated.principal_name>`. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,6 +27,7 @@ | |
#include "test/test_common/environment.h" | ||
#include "test/test_common/printers.h" | ||
#include "test/test_common/simulated_time_system.h" | ||
#include "test/test_common/test_runtime.h" | ||
#include "test/test_common/utility.h" | ||
|
||
#include "gmock/gmock.h" | ||
|
@@ -1177,7 +1178,10 @@ TEST_F(RouterTest, GrpcOkTrailersOnly) { | |
} | ||
|
||
// Validate gRPC AlreadyExists response stats are sane when response is trailers only. | ||
TEST_F(RouterTest, GrpcAlreadyExistsTrailersOnly) { | ||
TEST_F(RouterTest, GrpcAlreadyExistsTrailersOnlyRuntimeGuard) { | ||
TestScopedRuntime scoped_runtime; | ||
Runtime::LoaderSingleton::getExisting()->mergeValues( | ||
{{"envoy.reloadable_features.outlier_detection_support_for_grpc_status", "false"}}); | ||
NiceMock<Http::MockStreamEncoder> encoder1; | ||
Http::StreamDecoder* response_decoder = nullptr; | ||
EXPECT_CALL(cm_.conn_pool_, newStream(_, _)) | ||
|
@@ -1200,8 +1204,92 @@ TEST_F(RouterTest, GrpcAlreadyExistsTrailersOnly) { | |
EXPECT_TRUE(verifyHostUpstreamStats(1, 0)); | ||
} | ||
|
||
TEST_F(RouterTest, GrpcAlreadyExistsTrailersOnly) { | ||
TestScopedRuntime scoped_runtime; | ||
Runtime::LoaderSingleton::getExisting()->mergeValues( | ||
{{"envoy.reloadable_features.outlier_detection_support_for_grpc_status", "true"}}); | ||
NiceMock<Http::MockStreamEncoder> encoder1; | ||
Http::StreamDecoder* response_decoder = nullptr; | ||
EXPECT_CALL(cm_.conn_pool_, newStream(_, _)) | ||
.WillOnce(Invoke([&](Http::StreamDecoder& decoder, Http::ConnectionPool::Callbacks& callbacks) | ||
-> Http::ConnectionPool::Cancellable* { | ||
response_decoder = &decoder; | ||
callbacks.onPoolReady(encoder1, cm_.conn_pool_.host_); | ||
return nullptr; | ||
})); | ||
expectResponseTimerCreate(); | ||
|
||
Http::TestHeaderMapImpl headers{{"content-type", "application/grpc"}, {"grpc-timeout", "20S"}}; | ||
HttpTestUtility::addDefaultHeaders(headers); | ||
router_.decodeHeaders(headers, true); | ||
|
||
Http::HeaderMapPtr response_headers( | ||
new Http::TestHeaderMapImpl{{":status", "200"}, {"grpc-status", "6"}}); | ||
EXPECT_CALL(cm_.conn_pool_.host_->outlier_detector_, putHttpResponseCode(409)); | ||
response_decoder->decodeHeaders(std::move(response_headers), true); | ||
EXPECT_TRUE(verifyHostUpstreamStats(1, 0)); | ||
} | ||
|
||
// Validate outlier detections records failure when gRPC response status is Unavailable. | ||
TEST_F(RouterTest, GrpcOutlierDetectionUnavailableStatusCodeRuntimeGuard) { | ||
TestScopedRuntime scoped_runtime; | ||
Runtime::LoaderSingleton::getExisting()->mergeValues( | ||
{{"envoy.reloadable_features.outlier_detection_support_for_grpc_status", "false"}}); | ||
NiceMock<Http::MockStreamEncoder> encoder1; | ||
Http::StreamDecoder* response_decoder = nullptr; | ||
EXPECT_CALL(cm_.conn_pool_, newStream(_, _)) | ||
.WillOnce(Invoke([&](Http::StreamDecoder& decoder, Http::ConnectionPool::Callbacks& callbacks) | ||
-> Http::ConnectionPool::Cancellable* { | ||
response_decoder = &decoder; | ||
callbacks.onPoolReady(encoder1, cm_.conn_pool_.host_); | ||
return nullptr; | ||
})); | ||
expectResponseTimerCreate(); | ||
|
||
Http::TestHeaderMapImpl headers{{"content-type", "application/grpc"}, {"grpc-timeout", "20S"}}; | ||
HttpTestUtility::addDefaultHeaders(headers); | ||
router_.decodeHeaders(headers, true); | ||
|
||
Http::HeaderMapPtr response_headers( | ||
new Http::TestHeaderMapImpl{{":status", "200"}, {"grpc-status", "14"}}); | ||
// Outlier detector will use the gRPC response status code. | ||
EXPECT_CALL(cm_.conn_pool_.host_->outlier_detector_, putHttpResponseCode(200)); | ||
response_decoder->decodeHeaders(std::move(response_headers), true); | ||
EXPECT_TRUE(verifyHostUpstreamStats(0, 1)); | ||
} | ||
|
||
TEST_F(RouterTest, GrpcOutlierDetectionUnavailableStatusCode) { | ||
TestScopedRuntime scoped_runtime; | ||
Runtime::LoaderSingleton::getExisting()->mergeValues( | ||
{{"envoy.reloadable_features.outlier_detection_support_for_grpc_status", "true"}}); | ||
NiceMock<Http::MockStreamEncoder> encoder1; | ||
Http::StreamDecoder* response_decoder = nullptr; | ||
EXPECT_CALL(cm_.conn_pool_, newStream(_, _)) | ||
.WillOnce(Invoke([&](Http::StreamDecoder& decoder, Http::ConnectionPool::Callbacks& callbacks) | ||
-> Http::ConnectionPool::Cancellable* { | ||
response_decoder = &decoder; | ||
callbacks.onPoolReady(encoder1, cm_.conn_pool_.host_); | ||
return nullptr; | ||
})); | ||
expectResponseTimerCreate(); | ||
|
||
Http::TestHeaderMapImpl headers{{"content-type", "application/grpc"}, {"grpc-timeout", "20S"}}; | ||
HttpTestUtility::addDefaultHeaders(headers); | ||
router_.decodeHeaders(headers, true); | ||
|
||
Http::HeaderMapPtr response_headers( | ||
new Http::TestHeaderMapImpl{{":status", "200"}, {"grpc-status", "14"}}); | ||
// Outlier detector will use the gRPC response status code. | ||
EXPECT_CALL(cm_.conn_pool_.host_->outlier_detector_, putHttpResponseCode(503)); | ||
response_decoder->decodeHeaders(std::move(response_headers), true); | ||
EXPECT_TRUE(verifyHostUpstreamStats(0, 1)); | ||
} | ||
ZhouyihaiDing marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
// Validate gRPC Internal response stats are sane when response is trailers only. | ||
TEST_F(RouterTest, GrpcInternalTrailersOnly) { | ||
TEST_F(RouterTest, GrpcInternalTrailersOnlyRuntimeGuard) { | ||
TestScopedRuntime scoped_runtime; | ||
Runtime::LoaderSingleton::getExisting()->mergeValues( | ||
{{"envoy.reloadable_features.outlier_detection_support_for_grpc_status", "false"}}); | ||
NiceMock<Http::MockStreamEncoder> encoder1; | ||
Http::StreamDecoder* response_decoder = nullptr; | ||
EXPECT_CALL(cm_.conn_pool_, newStream(_, _)) | ||
|
@@ -1224,6 +1312,32 @@ TEST_F(RouterTest, GrpcInternalTrailersOnly) { | |
EXPECT_TRUE(verifyHostUpstreamStats(0, 1)); | ||
} | ||
|
||
TEST_F(RouterTest, GrpcInternalTrailersOnly) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Need comment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
TestScopedRuntime scoped_runtime; | ||
Runtime::LoaderSingleton::getExisting()->mergeValues( | ||
{{"envoy.reloadable_features.outlier_detection_support_for_grpc_status", "true"}}); | ||
NiceMock<Http::MockStreamEncoder> encoder1; | ||
Http::StreamDecoder* response_decoder = nullptr; | ||
EXPECT_CALL(cm_.conn_pool_, newStream(_, _)) | ||
.WillOnce(Invoke([&](Http::StreamDecoder& decoder, Http::ConnectionPool::Callbacks& callbacks) | ||
-> Http::ConnectionPool::Cancellable* { | ||
response_decoder = &decoder; | ||
callbacks.onPoolReady(encoder1, cm_.conn_pool_.host_); | ||
return nullptr; | ||
})); | ||
expectResponseTimerCreate(); | ||
|
||
Http::TestHeaderMapImpl headers{{"content-type", "application/grpc"}, {"grpc-timeout", "20S"}}; | ||
HttpTestUtility::addDefaultHeaders(headers); | ||
router_.decodeHeaders(headers, true); | ||
|
||
Http::HeaderMapPtr response_headers( | ||
new Http::TestHeaderMapImpl{{":status", "200"}, {"grpc-status", "13"}}); | ||
EXPECT_CALL(cm_.conn_pool_.host_->outlier_detector_, putHttpResponseCode(500)); | ||
response_decoder->decodeHeaders(std::move(response_headers), true); | ||
EXPECT_TRUE(verifyHostUpstreamStats(0, 1)); | ||
} | ||
|
||
// Validate gRPC response stats are sane when response is ended in a DATA | ||
// frame. | ||
TEST_F(RouterTest, GrpcDataEndStream) { | ||
|
@@ -2755,7 +2869,10 @@ TEST_F(RouterTest, RetryUpstream5xxNotComplete) { | |
.value()); | ||
} | ||
|
||
TEST_F(RouterTest, RetryUpstreamGrpcCancelled) { | ||
TEST_F(RouterTest, RetryUpstreamGrpcCancelledRuntimeGuard) { | ||
TestScopedRuntime scoped_runtime; | ||
Runtime::LoaderSingleton::getExisting()->mergeValues( | ||
{{"envoy.reloadable_features.outlier_detection_support_for_grpc_status", "false"}}); | ||
NiceMock<Http::MockStreamEncoder> encoder1; | ||
Http::StreamDecoder* response_decoder = nullptr; | ||
EXPECT_CALL(cm_.conn_pool_, newStream(_, _)) | ||
|
@@ -2803,6 +2920,57 @@ TEST_F(RouterTest, RetryUpstreamGrpcCancelled) { | |
EXPECT_TRUE(verifyHostUpstreamStats(1, 1)); | ||
} | ||
|
||
TEST_F(RouterTest, RetryUpstreamGrpcCancelled) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you generally add comments to all these There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
TestScopedRuntime scoped_runtime; | ||
Runtime::LoaderSingleton::getExisting()->mergeValues( | ||
{{"envoy.reloadable_features.outlier_detection_support_for_grpc_status", "true"}}); | ||
NiceMock<Http::MockStreamEncoder> encoder1; | ||
Http::StreamDecoder* response_decoder = nullptr; | ||
EXPECT_CALL(cm_.conn_pool_, newStream(_, _)) | ||
.WillOnce(Invoke([&](Http::StreamDecoder& decoder, Http::ConnectionPool::Callbacks& callbacks) | ||
-> Http::ConnectionPool::Cancellable* { | ||
response_decoder = &decoder; | ||
callbacks.onPoolReady(encoder1, cm_.conn_pool_.host_); | ||
return nullptr; | ||
})); | ||
expectResponseTimerCreate(); | ||
|
||
Http::TestHeaderMapImpl headers{{"x-envoy-retry-grpc-on", "cancelled"}, | ||
{"x-envoy-internal", "true"}, | ||
{"content-type", "application/grpc"}, | ||
{"grpc-timeout", "20S"}}; | ||
HttpTestUtility::addDefaultHeaders(headers); | ||
router_.decodeHeaders(headers, true); | ||
|
||
// gRPC with status "cancelled" (1) | ||
router_.retry_state_->expectHeadersRetry(); | ||
Http::HeaderMapPtr response_headers1( | ||
new Http::TestHeaderMapImpl{{":status", "200"}, {"grpc-status", "1"}}); | ||
EXPECT_CALL(cm_.conn_pool_.host_->outlier_detector_, putHttpResponseCode(499)); | ||
response_decoder->decodeHeaders(std::move(response_headers1), true); | ||
EXPECT_TRUE(verifyHostUpstreamStats(0, 1)); | ||
|
||
// We expect the grpc-status to result in a retried request. | ||
EXPECT_CALL(encoder1.stream_, resetStream(_)).Times(0); | ||
NiceMock<Http::MockStreamEncoder> encoder2; | ||
EXPECT_CALL(cm_.conn_pool_, newStream(_, _)) | ||
.WillOnce(Invoke([&](Http::StreamDecoder& decoder, Http::ConnectionPool::Callbacks& callbacks) | ||
-> Http::ConnectionPool::Cancellable* { | ||
response_decoder = &decoder; | ||
callbacks.onPoolReady(encoder2, cm_.conn_pool_.host_); | ||
return nullptr; | ||
})); | ||
router_.retry_state_->callback_(); | ||
|
||
// Normal response. | ||
EXPECT_CALL(*router_.retry_state_, shouldRetryHeaders(_, _)).WillOnce(Return(RetryStatus::No)); | ||
Http::HeaderMapPtr response_headers( | ||
new Http::TestHeaderMapImpl{{":status", "200"}, {"grpc-status", "0"}}); | ||
EXPECT_CALL(cm_.conn_pool_.host_->outlier_detector_, putHttpResponseCode(200)); | ||
response_decoder->decodeHeaders(std::move(response_headers), true); | ||
EXPECT_TRUE(verifyHostUpstreamStats(1, 1)); | ||
} | ||
|
||
// Verifies that the initial host is select with max host count of one, but during retries | ||
// RetryPolicy will be consulted. | ||
TEST_F(RouterTest, RetryRespsectsMaxHostSelectionCount) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the way external links work in sphinx/RST. You can view the rendered docs in the CI artifacts or you can build the docs locally using
docs/build.sh
to view them if that helps.Also, nit, can you potentially put this under a gRPC sub-section just to draw attention to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.