Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce number of srds update callbacks #12118

Merged
merged 25 commits into from
Jul 24, 2020

Conversation

chaoqin-li1123
Copy link
Member

@chaoqin-li1123 chaoqin-li1123 commented Jul 15, 2020

Commit Message:
Currently, in scope route discovery service, when resources are received from a management server, for each added, updated and removed scope route, a callback is fired. If several update callbacks are combined into a single update callback, less performance penalty will be incurred. This can be achieved by moving applyConfigUpdate out of the for loop and applying all the updates in a single callback. In this case, partial update won't be accepted both in sotw and delta srds. Scope key and scope name conflicts will be checked before any config update is applied.
Additional Description:
Risk Level: Low
Testing: Modify a unit test related to change.
Docs Changes: The behavior of delta srds will be changed, partial update won't be accepted.
Release Notes:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]

chaoqinli added 6 commits July 14, 2020 22:34
Signed-off-by: chaoqinli <chaoqinli@google.com>
Signed-off-by: chaoqinli <chaoqinli@google.com>
Signed-off-by: chaoqinli <chaoqinli@google.com>
Signed-off-by: chaoqinli <chaoqinli@google.com>
Signed-off-by: chaoqinli <chaoqinli@google.com>
Merge branch 'master' of https://github.com/envoyproxy/envoy into srds_update
@chaoqin-li1123 chaoqin-li1123 changed the title Srds update reduce number of srds update callbacks Jul 15, 2020
Merge branch 'master' of https://github.com/envoyproxy/envoy into srds_update
@junr03
Copy link
Member

junr03 commented Jul 16, 2020

@AndresGuedez could you take a pass at this?

Copy link
Contributor

@AndresGuedez AndresGuedez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just a few comments.

scoped_route_info_by_key_.erase(iter->second->scopeKey().hash());
scoped_route_info_by_name_.erase(iter);
void ScopedConfigImpl::removeRoutingScopes(const std::vector<std::string>& scope_names) {
for (std::string const& scope_name : scope_names) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for consistency, I suggest using auto here as well.

thread_local_scoped_config->addOrUpdateRoutingScope(scoped_route_info);
return config;
});
updated_scopes.push_back(scoped_route_info);
any_applied = true;
ENVOY_LOG(debug, "srds: add/update scoped_route '{}', version: {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest slightly changing the wording of this log message to clarify that the config has not yet been applied. e.g., "srds: queueing add/update of scoped_route ..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, the changes are applied after the log.

any_applied = true;
ENVOY_LOG(debug, "srds: add/update scoped_route '{}', version: {}",
scoped_route_info->scopeName(), version_info);
} catch (const EnvoyException& e) {
exception_msgs.emplace_back(absl::StrCat("", e.what()));
}
}
if (!updated_scopes.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this changes the logic such that partial application of an SRDS config update is no longer possible. For example, if there is a duplicate scoped route, all preceding routes would have been applied prior to this change.

I actually think the new behavior is preferable; I suggest adding a test to explicitly validate this logic.

FYI @stevenzzzz

Copy link
Member Author

@chaoqin-li1123 chaoqin-li1123 Jul 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, currently the srds uses SotW. A test to validate that partial update is not accepted will be good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Andres is saying in delta XDS within a response handling, we could have a portion of the list applied, as the previous scope info is distributed to workers. We need to get this behavior tested.
I actually dont have a strong opinion of it, delta xDS cant send per-resource status back to management server anyway, so go ahead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even for a regular SOTW config update, the existing logic could partially apply config since applyConfigUpdate() is called inline for each scoped route distributed in the DiscoveryResponse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the SotW api, the two errors has been checked before the Delta interface is called.
Actually a good refactoring could be moving the verification code into a helper fuction. @chaoqin-li1123

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks Xin. This only applies to delta updates.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the new change is applied, in delta api, partial updates is accepted in main thread when exception happens, but the partial updates are not applied to worker threads. We want the update to be fully rejected in both main thread and worker thread, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can store all the incoming scoped_route_config in a vector, pass the reference of this vector to a helper function to detect any scope key or scope name conflict. If there is any conflict, no change will be applied, otherwise all changes will be applied by creating corresponding RdsRouteConfigProviderHelper and push scope_route_info into the maps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A helper function has been added to detect key and scope name conflict.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

thread_local_scoped_config->removeRoutingScope(scope_name);
return config;
});
removed_scope_names.push_back(scope_name);
ENVOY_LOG(debug, "srds: remove scoped route '{}', version: {}", scope_name, version_info);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as prior comment, I suggest modifying the wording to clarify that the update has not yet been applied.

chaoqinli added 4 commits July 17, 2020 23:37
Signed-off-by: chaoqinli <chaoqinli@google.com>
Signed-off-by: chaoqinli <chaoqinli@google.com>
Signed-off-by: chaoqinli <chaoqinli@google.com>
Signed-off-by: chaoqinli <chaoqinli@google.com>
Copy link
Contributor

@stevenzzzz stevenzzzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

scoped_route_map_[scoped_route_info->scopeName()] = scoped_route_info;
updated_scopes.push_back(scoped_route_info);
any_applied = true;
ENVOY_LOG(debug, "srds: add/update scoped_route '{}', version: {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of date comment, follow Andres' suggestion here as well.

}
return any_applied;
}
} // namespace Router
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I will remove it.

// Do not delete RDS config providers just yet, in case the to be deleted RDS subscriptions could
// be reused by some to be added scopes.
std::list<ScopedRdsConfigSubscription::RdsRouteConfigProviderHelperPtr>
to_be_removed_rds_providers = removeScopes(removed_resources, version_info);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra blank lines

@@ -137,64 +138,52 @@ ScopedRdsConfigSubscription::RdsRouteConfigProviderHelper::RdsRouteConfigProvide
bool ScopedRdsConfigSubscription::addOrUpdateScopes(
const std::vector<Envoy::Config::DecodedResourceRef>& resources, Init::Manager& init_manager,
const std::string& version_info, std::vector<std::string>& exception_msgs) {
ASSERT(exception_msgs.empty());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to, this parameter is not used in this method now, remove it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, and we don't need any_applied anymore if addOrUpdateScopes() doesn't do validation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think anyapplied is still required to give an signal to whether call the setLastConfigInfo method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

} catch (const EnvoyException& e) {
exception_msgs.emplace_back(absl::StrCat("", e.what()));
}
if (!exception_msgs.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whynot throw in the above catch clause?

// if new scope have key conflict with scope that is not going to be removed
if (scope_name_by_hash_.find(key_fingerprint) != scope_name_by_hash_.end() &&
scope_name_by_hash_[key_fingerprint] != scope_name &&
removed_scopes.find(scope_name_by_hash_[key_fingerprint]) == removed_scopes.end()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use contains instead of comparing the iterator.

@@ -330,17 +346,24 @@ void ScopedRdsConfigSubscription::onConfigUpdate(
fmt::format("scope key conflict found, first scope is '{}', second scope is '{}'",
scope_name_by_key_hash[key_fingerprint], scope_name));
}
// if new scope have key conflict with scope that is not going to be removed
if (scope_name_by_hash_.find(key_fingerprint) != scope_name_by_hash_.end() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: using a iterator to avoid multiple look up.

}

Protobuf::RepeatedPtrField<std::string>
ScopedRdsConfigSubscription::detectUpdateConflictAndCleanupRemoved(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the order is problematic here:
lets say we have {h1:scope1, h2:scope2}, and the update is like {scope1:h2, scope2:h3}
the update will be rejected as the hash of new scope1 is conflicting the original scope2.
But you can see we should not reject that, this is not introduced by your change, but rather something we can improve, could you make this improvement?

also remember deletions are done ahead of add/updates, let's add a final-state hash, apply deletions first, apply ALL the add/updates, and then count the hash conflict.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update won't be rejected in this case.
There are 3 cases where update will be rejected:
In sotw api, when the incoming resources have key or scope name conflict.
In delta api, when the incoming resources have key or scope name conflict.
In delta api, when the incoming resources have key or scope name conflict with old scope that are not removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new procedure, in an sotw update, we mark all old scopes as removed scopes from the beginning, and conflict key and scope name with removed scopes will be ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not following.
I think SotW and Delta form now uses the same verification logic.

There might be some confusion, but the case I was proposing is like "an incoming resource has a hash conflict with an existing scope, which will be updated by the same update but haven't yet".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, in Delta APi, this is still a probelm. I missed your last reply, sorry.

// if new scope have key conflict with scope that is not going to be removed
if (scope_name_by_hash_.find(key_fingerprint) != scope_name_by_hash_.end() &&
scope_name_by_hash_[key_fingerprint] != scope_name &&
removed_scopes.find(scope_name_by_hash_[key_fingerprint]) == removed_scopes.end()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for delta API, lets say previously it's {scope1:h1}, the update is ({scope2:h1, scope1:h2},removed={})
will the update gets rejected here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it will gets rejected because scope2:h1 has key conflict with old scope with different name scope1:h1. This can be fixed easily though. Just marked all updated scopes as removed scope.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should add a unit test to cover this corner case?

@stevenzzzz
Copy link
Contributor

stevenzzzz commented Jul 20, 2020 via email

chaoqinli added 2 commits July 20, 2020 17:16
Protobuf::RepeatedPtrField<std::string> clean_removed_resources;
try {
clean_removed_resources =
detectUpdateConflictAndCleanupRemoved(added_resources, removed_resources);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest avoiding the use of an exception to return the conflicting resources. It complicates the error handling logic for an event that is not really "exceptional" (i.e., this function is explicitly checking for that condition).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I have changed the validation function to return normally instead of throwing an exception when detecting conflict.

->name(),
"foo_routes");
// Scope key "x-foo-key" points to nowhere.
EXPECT_NE(getScopedRdsProvider(), nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest the more readable (IMO): ASSERT_THAT(getScopedRdsProvider(), Not(IsNull())) instead.

Also, use ASSERT instead of EXPECT. Same below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

EXPECT_EQ(getScopedRouteMap().size(), 1UL);
EXPECT_EQ(getScopedRouteMap().count("foo_scope"), 1);
// Scope key "x-foo-key" points to nowhere.
EXPECT_NE(getScopedRdsProvider(), nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, consider using ASSERT_THAT().

// Do not delete RDS config providers just yet, in case the to be deleted RDS subscriptions could
// be reused by some to be added scopes.
std::list<ScopedRdsConfigSubscription::RdsRouteConfigProviderHelperPtr>
to_be_removed_rds_providers = removeScopes(removed_resources, version_info);
std::list<std::unique_ptr<ScopedRdsConfigSubscription::RdsRouteConfigProviderHelper>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unintentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

chaoqinli added 4 commits July 20, 2020 22:15
Signed-off-by: chaoqinli <chaoqinli@google.com>
Signed-off-by: chaoqinli <chaoqinli@google.com>
Signed-off-by: chaoqinli <chaoqinli@google.com>
AndresGuedez
AndresGuedez previously approved these changes Jul 21, 2020
Copy link
Contributor

@AndresGuedez AndresGuedez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the cleanup!

std::vector<std::string> exception_msgs;
std::string exception_msg;
Protobuf::RepeatedPtrField<std::string> clean_removed_resources;
detectUpdateConflictAndCleanupRemoved(added_resources, removed_resources, clean_removed_resources,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's generally preferable for readability to use a return arg as opposed to output parameters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I have changed clean_removed_resources to be the explicitly returned value.

Signed-off-by: chaoqinli <chaoqinli@google.com>
AndresGuedez
AndresGuedez previously approved these changes Jul 22, 2020
Copy link
Contributor

@AndresGuedez AndresGuedez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

stevenzzzz
stevenzzzz previously approved these changes Jul 22, 2020
Copy link
Contributor

@stevenzzzz stevenzzzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix the nits please

any_applied = true;
ENVOY_LOG(debug, "srds: add/update scoped_route '{}', version: {}",
scoped_route_info->scopeName(), version_info);
} catch (const EnvoyException& e) {
exception_msgs.emplace_back(absl::StrCat("", e.what()));
}
}
if (!updated_scopes.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

const std::vector<Envoy::Config::DecodedResourceRef>& resources,
const Protobuf::RepeatedPtrField<std::string>& removed_resources, std::string& exception_msg) {
Protobuf::RepeatedPtrField<std::string> clean_removed_resources;
// all the scope names to be removed or updated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit capitalize

}

absl::flat_hash_map<uint64_t, std::string> scope_name_by_hash = scope_name_by_hash_;
// don't check key conflict with scopes to be updated or removed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is also confusing, I think we checked the conflict later in the for loops.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only check conflict with scopes that are not updated or removed. Maybe we can change the wording a little bit to make it less confusing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's confusing as it says "we dont check conflict within the updated scopes."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "Don't check key conflict with old scopes to be updated or removed."?

Signed-off-by: chaoqinli <chaoqinli@google.com>
@stevenzzzz
Copy link
Contributor

/assign htuch

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this seems a good idea! A few minor bits of feedback.
/wait

absl::flat_hash_map<uint64_t, std::string> scope_name_by_hash = scope_name_by_hash_;
absl::erase_if(scope_name_by_hash, [&updated_or_removed_scopes](const auto& key_name) {
auto const& [key, name] = key_name;
return (updated_or_removed_scopes.contains(name));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: superfluous parens.

@@ -319,28 +340,27 @@ void ScopedRdsConfigSubscription::onConfigUpdate(
const std::string& scope_name = scoped_route.name();
auto scope_config_inserted = scoped_routes.try_emplace(scope_name, std::move(scoped_route));
if (!scope_config_inserted.second) {
throw EnvoyException(
fmt::format("duplicate scoped route configuration '{}' found", scope_name));
exception_msg = fmt::format("duplicate scoped route configuration '{}' found", scope_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we no longer throwing and doing error-by-value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Andres think it is not a good style to throw exception in a validation function. "I suggest avoiding the use of an exception to return the conflicting resources. It complicates the error handling logic for an event that is not really "exceptional" (i.e., this function is explicitly checking for that condition)."

std::vector<std::string> exception_msgs;
std::string exception_msg;
Protobuf::RepeatedPtrField<std::string> clean_removed_resources =
detectUpdateConflictAndCleanupRemoved(added_resources, removed_resources, exception_msg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments.

onConfigUpdate(resources, to_remove_repeated, version_info);
}

Protobuf::RepeatedPtrField<std::string>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comment here explaining what this does? What kind of update conflict?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both scope name and scope key conflicts. The conflict can be between new scopes and old scopes, and it can also be between new scopes.

Signed-off-by: chaoqinli <chaoqinli@google.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@htuch htuch merged commit b9a41ad into envoyproxy:master Jul 24, 2020
KBaichoo pushed a commit to KBaichoo/envoy that referenced this pull request Jul 30, 2020
Currently, in scope route discovery service, when resources are received from a management server, for each added, updated and removed scope route, a callback is fired. If several update callbacks are combined into a single update callback, less performance penalty will be incurred. This can be achieved by moving applyConfigUpdate out of the for loop and applying all the updates in a single callback. In this case, partial update won't be accepted both in sotw and delta srds. Scope key and scope name conflicts will be checked before any config update is applied.

Risk Level: Low
Testing: Modify a unit test related to change.
Docs Changes: The behavior of delta srds will be changed, partial update won't be accepted.

Signed-off-by: chaoqinli <chaoqinli@google.com>
Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
chaoqin-li1123 added a commit to chaoqin-li1123/envoy that referenced this pull request Aug 7, 2020
Currently, in scope route discovery service, when resources are received from a management server, for each added, updated and removed scope route, a callback is fired. If several update callbacks are combined into a single update callback, less performance penalty will be incurred. This can be achieved by moving applyConfigUpdate out of the for loop and applying all the updates in a single callback. In this case, partial update won't be accepted both in sotw and delta srds. Scope key and scope name conflicts will be checked before any config update is applied.

Risk Level: Low
Testing: Modify a unit test related to change.
Docs Changes: The behavior of delta srds will be changed, partial update won't be accepted.

Signed-off-by: chaoqinli <chaoqinli@google.com>
@chaoqin-li1123 chaoqin-li1123 deleted the srds_update branch February 23, 2021 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants