upstream: avoid reset after end_stream #14106

snowp · 2020-11-19T20:33:19Z

Signed-off-by: Snow Pettersen snowp@lyft.com

For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md

Risk Level: Medium
Testing: New UT
Docs Changes: n/a
Release Notes: n/a

Signed-off-by: Snow Pettersen <snowp@lyft.com>

snowp · 2020-11-19T20:34:27Z

@alyssawilk This attempts to fix the issue with double ends that were happening in one of the integration tests in #13095 that I mentioned to you over Slack a while ago. I wasn't able to figure out why we get a end_stream=true and a reset, but at least this prevents it from reaching the router. LMK what you think

snowp · 2020-11-23T14:15:03Z

Interestingly this seems to trigger an ASAN heap-use-after-free when I try to modify the upstream_request_ field, indicating that the TcpUpstream keeps getting onUpstreamData callbacks after it has been destroyed. Will try to figure out how this happens

Signed-off-by: Snow Pettersen <snowp@lyft.com>

alyssawilk

Thanks for catching this (and for figuring out asan!)
One thought I think isn't covered by tests today but I'd rather fix in advance...

alyssawilk · 2020-11-23T14:56:55Z

source/extensions/upstreams/http/tcp/upstream_request.cc

@@ -87,6 +87,11 @@ void TcpUpstream::resetStream() {

 void TcpUpstream::onUpstreamData(Buffer::Instance& data, bool end_stream) {
  upstream_request_->decodeData(data, end_stream);
+  // This ensures that if we get a reset after end_stream we won't propagate two
+  // "end streams" to the upstream_request_.
+  if (end_stream) {


once we have upstream filters, will it be possible to get an upstream "end stream "before we send headers? I think it could be the case, so I think we're going to want to do null pointer checks for upstream_request in few other places, WDYT?

Yeah good point, I'll go over all the functions

Signed-off-by: Snow Pettersen <snowp@lyft.com>

alyssawilk · 2020-11-24T18:24:16Z

source/extensions/upstreams/http/tcp/upstream_request.cc

@@ -45,6 +46,11 @@ void TcpUpstream::encodeData(Buffer::Instance& data, bool end_stream) {

 Envoy::Http::Status TcpUpstream::encodeHeaders(const Envoy::Http::RequestHeaderMap&,
                                               bool end_stream) {
+  if (!upstream_request_) {
+    // TODO(snowp): Should this return something else in this case?


yeah, seems like an error case to me.

Signed-off-by: Snow Pettersen <snowp@lyft.com>

alyssawilk · 2020-11-30T18:22:18Z

test/extensions/upstreams/http/tcp/upstream_request_test.cc

+// Verifies that if we send headers after the upstream has been reset nothing crashes.
+TEST_F(TcpUpstreamTest, HeadersAfterReset) {
+  tcp_upstream_->onEvent(Network::ConnectionEvent::LocalClose);
+  EXPECT_TRUE(tcp_upstream_->encodeHeaders(request_, false).ok());


shouldn't this fail given the status error?

Signed-off-by: Snow Pettersen <snowp@lyft.com>

alyssawilk

Awesome. Thanks for driving this one though!

mattklein123 · 2020-12-01T23:14:48Z

source/common/router/upstream_request.cc

@@ -480,8 +480,8 @@ void UpstreamRequest::clearRequestEncoder() {
  // Before clearing the encoder, unsubscribe from callbacks.
  if (upstream_) {
    parent_.callbacks()->removeDownstreamWatermarkCallbacks(downstream_watermark_manager_);
+    parent_.callbacks()->dispatcher().deferredDelete(std::move(upstream_));


Drive by: @snowp @alyssawilk did you check carefully that the destructor of the various upstreams isn't referencing anything that can go away? We have had subtle bugs in this area and in general the use of deferred delete can be a little precarious.

Also, question: is this change defensive or is there a real case that causes this? It seems like we shouldn't be calling reset at all after we receive end stream? Why doesn't end_stream cause upstream_ to get nulled out, etc.?

The original motivation for this PR was to fix an issue I ran into in #13095 where a reset would follow an end_stream=true callback and both of these events would be propagated to the router/UpstreamRequest. This ran into an issue when resetting the upstream_request_ as part of the router, hence the deferred deletions. It does feel like there's something weird going on here, but this seemed to be the easiest fix to what was happening on the other PR. Happy to discuss more (and even revert if we think this is the wrong way to handle it).

The remaining changes here are defensive changes in anticipation of the upstream FM changes, at least I don't have a clear sense of when this would be a problem, maybe @alyssawilk does.

re: deferred delete, it's mainly just making sure the upstream can be deleted after the router part. Is there any reference in the destructor that might cause a timing issue? For example timers getting cancelled, etc. This can be somewhat subtle and needs to be audited.

re: the change itself, it does seem to me like this is not the correct fix, so we can at least open a tracking issue to try to figure out what state is not getting reset properly?

Looking through both the HTTP and TCP upstream neither of them has a dtor that does anything but a null check, and the only thing that would be destructed as part of it is the ConnectionData handle. The TcpConnectionData seems to indicate that it's resilient to deferred deletion ordering so I suspect this is fine:

~TcpConnectionData() override { // Generally it is the case that TcpConnectionData will be destroyed before the // ActiveTcpClient. Because ordering on the deferred delete list is not guaranteed in the // case of a disconnect, make sure parent_ is valid before doing clean-up. if (parent_) { parent_->clearCallbacks(); } }

I definitely had not done my due diligence here so, so thanks for pointing this out.

I'll file an issue and try to dig a bit deeper if I can find the time

* master: (70 commits) upstream: avoid reset after end_stream in TCP HTTP upstream (envoyproxy#14106) bazelci: add fuzz coverage (envoyproxy#14179) dependencies: allowlist CVE-2020-8277 to prevent false positives. (envoyproxy#14228) cleanup: replace ad-hoc [0, 1] value types with UnitFloat (envoyproxy#14081) Update docs for skywalking tracer (envoyproxy#14210) Fix some errors in the switch statement when decode dubbo response (envoyproxy#14207) Windows: enable tests and envoy-static.exe pdb file (envoyproxy#13688) http: add Kill Request HTTP filter (envoyproxy#14170) dependencies: fix release_dates error behavior. (envoyproxy#14216) thrift filter: support skip decoding data after metadata in the thrift message (envoyproxy#13592) update cares (envoyproxy#14213) docs: clarify behavior of hedge_on_per_try_timeout (envoyproxy#12983) repokitteh: add support for randomized auto-assign. (envoyproxy#14185) [grpc] validate grpc config for illegal characters (envoyproxy#14129) server: Return nullopt when process_context is nullptr (envoyproxy#14181) [Windows] Fix thrift proxy tests (envoyproxy#13220) kafka: add missing unit tests (envoyproxy#14195) doc: mention gperftools explicitly in PPROF.md (envoyproxy#14199) Removed `--use-fake-symbol-table` option. (envoyproxy#14178) filter contract: clarification around local replies (envoyproxy#14193) ... Signed-off-by: Michael Puncel <mpuncel@squareup.com>

…nvoyproxy#14106)" This reverts commit 1c06967. Signed-off-by: Snow Pettersen <snowp@lyft.com>

…14106)" (#14551) This reverts commit 1c06967. Signed-off-by: Snow Pettersen <snowp@lyft.com>

upstream: avoid reset after end_stream

2d3bcfb

Signed-off-by: Snow Pettersen <snowp@lyft.com>

snowp assigned alyssawilk Nov 19, 2020

defer delete the upstream to prevent inline deletion

1088604

Signed-off-by: Snow Pettersen <snowp@lyft.com>

alyssawilk reviewed Nov 23, 2020

View reviewed changes

snowp added 3 commits November 24, 2020 01:36

fix deferred deletions in test

a63edbf

Signed-off-by: Snow Pettersen <snowp@lyft.com>

format

6d9f5cb

Signed-off-by: Snow Pettersen <snowp@lyft.com>

defensively check upstream_request_ a few more places

a00f00b

Signed-off-by: Snow Pettersen <snowp@lyft.com>

alyssawilk reviewed Nov 24, 2020

View reviewed changes

snowp added 2 commits November 30, 2020 16:08

error status

999f575

Signed-off-by: Snow Pettersen <snowp@lyft.com>

format

cb63c1c

Signed-off-by: Snow Pettersen <snowp@lyft.com>

alyssawilk reviewed Nov 30, 2020

View reviewed changes

fix tests

a1a6a0e

Signed-off-by: Snow Pettersen <snowp@lyft.com>

alyssawilk approved these changes Dec 1, 2020

View reviewed changes

snowp merged commit 1c06967 into envoyproxy:master Dec 1, 2020

mattklein123 reviewed Dec 1, 2020

View reviewed changes

snowp mentioned this pull request Dec 2, 2020

Investigate reset after end_stream with HTTP over TCP tunneling #14250

Open

rmiller14 mentioned this pull request Dec 10, 2020

Windows: reverse_bridge_integration_test is flaky #14218

Closed

snowp added a commit to snowp/envoy that referenced this pull request Dec 31, 2020

Revert "upstream: avoid reset after end_stream in TCP HTTP upstream (e…

51af9c5

…nvoyproxy#14106)" This reverts commit 1c06967. Signed-off-by: Snow Pettersen <snowp@lyft.com>

snowp mentioned this pull request Jan 2, 2021

Revert "upstream: avoid reset after end_stream in TCP HTTP upstream (… #14551

Merged

mattklein123 pushed a commit that referenced this pull request Jan 2, 2021

Revert "upstream: avoid reset after end_stream in TCP HTTP upstream (#…

bf93e97

…14106)" (#14551) This reverts commit 1c06967. Signed-off-by: Snow Pettersen <snowp@lyft.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upstream: avoid reset after end_stream #14106

upstream: avoid reset after end_stream #14106

snowp commented Nov 19, 2020

snowp commented Nov 19, 2020

snowp commented Nov 23, 2020

alyssawilk left a comment

alyssawilk Nov 23, 2020

snowp Nov 23, 2020

alyssawilk Nov 24, 2020

alyssawilk Nov 30, 2020

alyssawilk left a comment

mattklein123 Dec 1, 2020

snowp Dec 2, 2020

mattklein123 Dec 2, 2020

snowp Dec 2, 2020

upstream: avoid reset after end_stream #14106

upstream: avoid reset after end_stream #14106

Conversation

snowp commented Nov 19, 2020

snowp commented Nov 19, 2020

snowp commented Nov 23, 2020

alyssawilk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alyssawilk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment