-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dns: introduce ResolutionStatus for ResolveCb and fix #9927 #10137
Conversation
Signed-off-by: Jose Nino <jnino@lyft.com>
@mattklein123 I finally got time to come back to this. I started by just introducing the new argument for the callback, but leaving all behavior unchanged. I didn't want to conflate this with the two independent changes (don't update hosts on failure, allow different timeouts on failure) needed for the linked issues. Let me know if you'd rather look at it all together. |
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this is great. Flushing out a few comments from my first pass. Thank you!
/wait
@@ -89,7 +89,7 @@ void DnsResolverImpl::PendingResolution::onAresGetAddrInfoCallback(int status, i | |||
// callback_ target _should_ still be around. In that case, raise the callback_ so the target | |||
// can be done with this query and initiate a new one. | |||
if (!cancelled_) { | |||
callback_({}); | |||
callback_(ResolutionStatus::Failure, {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you have another status like cancelled/destructing/etc.? This is not quite a failure? I'm not sure it matters to higher layer code but it might?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this when I first introduced the destruction behavior. My first opinion is that higher layers only pivot behavior based on success/failure so I was going to keep it at that level of complexity for this change, and the subsequent PR where I fix #9927.
Then I was going to explore the need to add a Destruction
value that allow us to set tighter timers than on "regular failure" when fixing for envoyproxy/envoy-mobile#673. But I am not sure if we need different behavior based on what particular failure scenario the resolver experienced. By the same token, people might want even more granularity in the future?
In short: I did not want to commit to more granular status in this initial PR, and I don't think it is going to be that bad to add more granularity, if desired, in subsequent PRs.
@mattklein123 updated. I have updated dns_impl_test for expectations around status in different conditions. I also updated higher level constructs to use status, but I didn't change any behavior yet based on this new information (other than adding more relevant stats where possible). I plan on addressing #10137 (comment) and hence #9927 in a subsequent PR. Let me know if you'd rather see it here. Edit: fixing #9927 here. |
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
// TODO(mattklein123): Move port handling into the DNS interface. | ||
ASSERT(response.front().address_ != nullptr); | ||
Network::Address::InstanceConstSharedPtr new_address = | ||
Network::Utility::getAddressWithPort(*(response.front().address_), | ||
Network::Utility::portFromTcpUrl(dns_url_)); | ||
|
||
if (respect_dns_ttl_ && response.front().ttl_ != std::chrono::seconds(0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved this for consistency with the strict dns cluster code. But happy to move back up.
tls_.shutdownThread(); | ||
} | ||
|
||
TEST_F(LogicalDnsParamTest, TtlAsDnsRefreshRate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding coverage that did not exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. Just a test comment. Thank you!
/wait
test/common/network/dns_impl_test.cc
Outdated
[&](DnsResolver::ResolutionStatus, | ||
std::list<DnsResponse>&& results) -> void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the best way would be but is there some way to simplify all of thee calls to make sure have a success/failure expectation on all of them? Perhaps some helper function/object that does the resolve, does the expectations, and stores the resolution list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
folded everything into resolveWithExpectations
Signed-off-by: Jose Nino <jnino@lyft.com>
EXPECT_TRUE(hasAddress(address_list, "201.134.56.7")); | ||
} | ||
|
||
TEST_P(DnsImplTest, DnsIpAddressVersion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the awesome test cleanup. 1 question. Thank you!
/wait-any
test/common/network/dns_impl_test.cc
Outdated
[&](const std::list<DnsResponse> & | ||
/*results*/) -> void { throw EnvoyException("Envoy exception"); })); | ||
EXPECT_EQ(nullptr, | ||
resolver_->resolve( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: Can't we use the new resolve with expectations everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! I thought it might be overkill given that there were far fewer call sites. But if I'm cleaning up something why not go the entire way :). Now everything (for real) is folded into helper functions.
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: Jose Nino <jnino@lyft.com>
Looks like the circle machines have a funny hosts file?
|
Thanks for the fix. If that's the CI error you are seeing it seems like it must have to do with your change? |
Yep, definitely has to do with my change. I just wanted to make sure that I was surfacing something latent in circle ci and not an unintended regression. Previously the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Description: updates envoy ref to include envoyproxy/envoy#10137. Exposes client builder function to add dns failure refresh rate. Risk Level: low, new configuration option Testing: local apps, CI Docs Changes: created #715 to create a section in the docs for all the builder options. Fixes #673
Description: updates envoy ref to include envoyproxy/envoy#10137. Exposes client builder function to add dns failure refresh rate. Risk Level: low, new configuration option Testing: local apps, CI Docs Changes: created #715 to create a section in the docs for all the builder options. Fixes #673 Signed-off-by: Michael Rebello <me@michaelrebello.com>
Description: updates envoy ref to include #10137. Exposes client builder function to add dns failure refresh rate. Risk Level: low, new configuration option Testing: local apps, CI Docs Changes: created #715 to create a section in the docs for all the builder options. Fixes #673 Signed-off-by: JP Simard <jp@jpsim.com>
Description: updates envoy ref to include #10137. Exposes client builder function to add dns failure refresh rate. Risk Level: low, new configuration option Testing: local apps, CI Docs Changes: created #715 to create a section in the docs for all the builder options. Fixes #673 Signed-off-by: JP Simard <jp@jpsim.com>
Description: this PR introduces ResolutionStatus to the ResolveCb. This status can be used to determine if DnsResolution was successful or not, and lets the callback target adjust accordingly. It is used to fix #9927. Additionally, it can be used to fix
envoyproxy/envoy-mobile#673 in a subsequent PR.
Risk Level: med. This PR changes the behavior of Strict Dns Cluster. The introduced argument is not used anywhere, so all behavior stays the same.
Testing: Updated tests to pass the new argument to the callback. But behavior stays the same.
Docs Changes: updated config docs and discovery docs.
Release Notes: added a release note.
Fixes #9927
Signed-off-by: Jose Nino jnino@lyft.com