Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dns: introduce ResolutionStatus for ResolveCb and fix #9927 #10137

Merged
merged 15 commits into from
Feb 28, 2020
Merged

dns: introduce ResolutionStatus for ResolveCb and fix #9927 #10137

merged 15 commits into from
Feb 28, 2020

Conversation

junr03
Copy link
Member

@junr03 junr03 commented Feb 21, 2020

Description: this PR introduces ResolutionStatus to the ResolveCb. This status can be used to determine if DnsResolution was successful or not, and lets the callback target adjust accordingly. It is used to fix #9927. Additionally, it can be used to fix
envoyproxy/envoy-mobile#673 in a subsequent PR.
Risk Level: med. This PR changes the behavior of Strict Dns Cluster. The introduced argument is not used anywhere, so all behavior stays the same.
Testing: Updated tests to pass the new argument to the callback. But behavior stays the same.
Docs Changes: updated config docs and discovery docs.
Release Notes: added a release note.

Fixes #9927

Signed-off-by: Jose Nino jnino@lyft.com

Signed-off-by: Jose Nino <jnino@lyft.com>
@junr03 junr03 changed the title introduce status to the callback and maintain prior functionality dns: introduce status to the callback and maintain prior functionality Feb 21, 2020
@junr03
Copy link
Member Author

junr03 commented Feb 21, 2020

@mattklein123 I finally got time to come back to this. I started by just introducing the new argument for the callback, but leaving all behavior unchanged. I didn't want to conflate this with the two independent changes (don't update hosts on failure, allow different timeouts on failure) needed for the linked issues.

Let me know if you'd rather look at it all together.

Signed-off-by: Jose Nino <jnino@lyft.com>
@junr03 junr03 changed the title dns: introduce status to the callback and maintain prior functionality dns: introduce ResolutionStatus for ResolveCb Feb 21, 2020
Jose Nino added 2 commits February 21, 2020 16:19
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this is great. Flushing out a few comments from my first pass. Thank you!

/wait

@@ -89,7 +89,7 @@ void DnsResolverImpl::PendingResolution::onAresGetAddrInfoCallback(int status, i
// callback_ target _should_ still be around. In that case, raise the callback_ so the target
// can be done with this query and initiate a new one.
if (!cancelled_) {
callback_({});
callback_(ResolutionStatus::Failure, {});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you have another status like cancelled/destructing/etc.? This is not quite a failure? I'm not sure it matters to higher layer code but it might?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this when I first introduced the destruction behavior. My first opinion is that higher layers only pivot behavior based on success/failure so I was going to keep it at that level of complexity for this change, and the subsequent PR where I fix #9927.

Then I was going to explore the need to add a Destruction value that allow us to set tighter timers than on "regular failure" when fixing for envoyproxy/envoy-mobile#673. But I am not sure if we need different behavior based on what particular failure scenario the resolver experienced. By the same token, people might want even more granularity in the future?

In short: I did not want to commit to more granular status in this initial PR, and I don't think it is going to be that bad to add more granularity, if desired, in subsequent PRs.

test/common/network/dns_impl_test.cc Outdated Show resolved Hide resolved
test/common/upstream/logical_dns_cluster_test.cc Outdated Show resolved Hide resolved
Signed-off-by: Jose Nino <jnino@lyft.com>
@junr03
Copy link
Member Author

junr03 commented Feb 25, 2020

@mattklein123 updated. I have updated dns_impl_test for expectations around status in different conditions. I also updated higher level constructs to use status, but I didn't change any behavior yet based on this new information (other than adding more relevant stats where possible). I plan on addressing #10137 (comment) and hence #9927 in a subsequent PR. Let me know if you'd rather see it here.

Edit: fixing #9927 here.

Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
@junr03 junr03 changed the title dns: introduce ResolutionStatus for ResolveCb dns: introduce ResolutionStatus for ResolveCb and fix #9927 Feb 26, 2020
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #10137 was synchronize by junr03.

see: more, trace.

Jose Nino added 2 commits February 25, 2020 17:45
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
// TODO(mattklein123): Move port handling into the DNS interface.
ASSERT(response.front().address_ != nullptr);
Network::Address::InstanceConstSharedPtr new_address =
Network::Utility::getAddressWithPort(*(response.front().address_),
Network::Utility::portFromTcpUrl(dns_url_));

if (respect_dns_ttl_ && response.front().ttl_ != std::chrono::seconds(0)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this for consistency with the strict dns cluster code. But happy to move back up.

tls_.shutdownThread();
}

TEST_F(LogicalDnsParamTest, TtlAsDnsRefreshRate) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding coverage that did not exist.

fmt
Signed-off-by: Jose Nino <jnino@lyft.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Just a test comment. Thank you!

/wait

test/common/network/dns_impl_test.cc Outdated Show resolved Hide resolved
Comment on lines 757 to 758
[&](DnsResolver::ResolutionStatus,
std::list<DnsResponse>&& results) -> void {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the best way would be but is there some way to simplify all of thee calls to make sure have a success/failure expectation on all of them? Perhaps some helper function/object that does the resolve, does the expectations, and stores the resolution list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

folded everything into resolveWithExpectations

Signed-off-by: Jose Nino <jnino@lyft.com>
EXPECT_TRUE(hasAddress(address_list, "201.134.56.7"));
}

TEST_P(DnsImplTest, DnsIpAddressVersion) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved above

Signed-off-by: Jose Nino <jnino@lyft.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the awesome test cleanup. 1 question. Thank you!

/wait-any

[&](const std::list<DnsResponse> &
/*results*/) -> void { throw EnvoyException("Envoy exception"); }));
EXPECT_EQ(nullptr,
resolver_->resolve(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: Can't we use the new resolve with expectations everywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! I thought it might be overkill given that there were far fewer call sites. But if I'm cleaning up something why not go the entire way :). Now everything (for real) is folded into helper functions.

Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
@junr03
Copy link
Member Author

junr03 commented Feb 27, 2020

Looks like the circle machines have a funny hosts file?

[----------] 1 test from IpVersions/DnsImplTest
[ RUN      ] IpVersions/DnsImplTest.LocalLookup/IPv4
test/common/network/dns_impl_test.cc:470: Failure
Expected equality of these values:
  expected_results
    Which is: { "127.0.0.1" }
  address_as_string_list
    Which is: { "127.0.0.1", "127.0.0.1" }

@mattklein123
Copy link
Member

Thanks for the fix. If that's the CI error you are seeing it seems like it must have to do with your change?

@junr03
Copy link
Member Author

junr03 commented Feb 28, 2020

Yep, definitely has to do with my change. I just wanted to make sure that I was surfacing something latent in circle ci and not an unintended regression. Previously the LocalLookup test only asserted that 127.0.0.1 was contained in the resolved address list, not exact match like now. I opened #10197 to verify. It looks like our circle CI setup ends up with a hosts file with a double entry for localhost. https://app.circleci.com/jobs/github/envoyproxy/envoy/324116.

Signed-off-by: Jose Nino <jnino@lyft.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mattklein123 mattklein123 merged commit 64d91b2 into envoyproxy:master Feb 28, 2020
rebello95 pushed a commit to envoyproxy/envoy-mobile that referenced this pull request Mar 2, 2020
Description: updates envoy ref to include envoyproxy/envoy#10137. Exposes client builder function to add dns failure refresh rate.
Risk Level: low, new configuration option
Testing: local apps, CI
Docs Changes: created #715 to create a section in the docs for all the builder options.

Fixes #673
rebello95 pushed a commit to envoyproxy/envoy-mobile that referenced this pull request Mar 3, 2020
Description: updates envoy ref to include envoyproxy/envoy#10137. Exposes client builder function to add dns failure refresh rate.
Risk Level: low, new configuration option
Testing: local apps, CI
Docs Changes: created #715 to create a section in the docs for all the builder options.

Fixes #673 
Signed-off-by: Michael Rebello <me@michaelrebello.com>
jpsim pushed a commit that referenced this pull request Nov 28, 2022
Description: updates envoy ref to include #10137. Exposes client builder function to add dns failure refresh rate.
Risk Level: low, new configuration option
Testing: local apps, CI
Docs Changes: created #715 to create a section in the docs for all the builder options.

Fixes #673

Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this pull request Nov 29, 2022
Description: updates envoy ref to include #10137. Exposes client builder function to add dns failure refresh rate.
Risk Level: low, new configuration option
Testing: local apps, CI
Docs Changes: created #715 to create a section in the docs for all the builder options.

Fixes #673

Signed-off-by: JP Simard <jp@jpsim.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DNS timeouts cause no healthy upstream
2 participants