Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable load balancing policy extensions #17400

Merged
merged 10 commits into from
Aug 14, 2021

Conversation

pianiststickman
Copy link
Contributor

Signed-off-by: Eugene Chan eugenechan@google.com

Commit Message: Enable load balancing policy extensions
Additional Description:

Enables LOAD_BALANCING_POLICY_CONFIG enum value in LbPolicy and supports typed load balancers specified in load_balancing_policy. Continues work done by Charlie Getzen charliegetzenlc@gmail.com in #15827.

Custom load balancers specified by load_balancing_policy are created as implementations of ThreadAwareLoadBalancer. Thread-local load balancers can be implemented as thread-aware load balancers that contain no logic at the thread-aware level, i.e. the purpose of the thread-aware LB is solely to contain the factory used to instantiate the thread-local LBs. (In the future it might be appropriate to provide a construct that abstracts away thread-aware aspects of ThreadAwareLoadBalancer for LBs that don't need to be thread-aware.)

A cluster that uses LOAD_BALANCING_POLICY_CONFIG may not also set a subset LB configuration. If the load balancer type makes use of subsetting, it should include a subset configuration in its own configuration message. Future work on load balancing extensions should include moving the subset LB to use load balancing extensions.

Similarly, a cluster that uses LOAD_BALANCING_POLICY_CONFIG may not set the CommonLbConfig, and it is not passed into load balancer creation (mostly owing to its dubious applicability as a top level configuration message to hierarchical load balancing policy). If the load balancer type makes use of the CommonLbConfig, it should include a CommonLbConfig in the configuration message for the load balancing policy.

Considerations for migration of existing load balancers:

  • pieces of the ThreadAwareLoadBalancerBase implementation are specific to the built-in hashing load balancers and should be moved into a base class specifically for hashing load balancers. As it stands, custom load balancing policies are required to implement a createLoadBalancer() method even if the architecture of the LB policy does not require a hashing load balancer. I think we would also benefit from disentangling ThreadAwareLoadBalancerBase from LoadBalancerBase, as the former never actually does any host picking.

  • as we convert existing thread-local load balancers to thread-aware load balancers, new local LBs will be re-created upon membership changes. We should provide a mechanism allowing load balancers to control whether this rebuild should occur, e.g. a callback that calls create() for thread-aware LBs by default, which can be overridden to do nothing for thread-local LBs.

Risk Level: low
Testing: brought up a cluster with a custom load balancer specified by load_balancing_policy; new unit tests included
Docs Changes: n/a
Release Notes: Enable load balancing policy extensions
Platform Specific Features: n/a
Fixes #5598

Enables LOAD_BALANCING_POLICY_CONFIG enum value in LbPolicy and supports
typed load balancers specified in load_balancing_policy. Continues work
done by Charlie Getzen <charliegetzenlc@gmail.com> in PR envoyproxy#15827.

Custom load balancers specified by load_balancing_policy are created as
implementations of ThreadAwareLoadBalancer. Thread-local load balancers
can be implemented as thread-aware load balancers that contain no logic
at the thread-aware level, i.e. the purpose of the thread-aware LB is
solely to contain the factory used to instantiate the thread-local LBs.
(In the future it might be appropriate to provide a construct that
abstracts away thread-aware aspects of ThreadAwareLoadBalancer for LBs
that don't need to be thread-aware.)

A cluster that uses LOAD_BALANCING_POLICY_CONFIG may not also set a
subset LB configuration. If the load balancer type makes use of
subsetting, it should include a subset configuration in its own
configuration message. Future work on load balancing extensions should
include moving the subset LB to use load balancing extensions.

Similarly, a cluster that uses LOAD_BALANCING_POLICY_CONFIG may not set
the CommonLbConfig, and it is not passed into load balancer creation
(mostly owing to its dubious applicability as a top level configuration
message to hierarchical load balancing policy). If the load balancer type
makes use of the CommonLbConfig, it should include a CommonLbConfig in
the configuration message for the load balancing policy.

Considerations for migration of existing load balancers:

- pieces of the ThreadAwareLoadBalancerBase implementation are specific
to the built-in hashing load balancers and should be moved into a base
class specifically for hashing load balancers. As it stands, custom load
balancing policies are required to implement a createLoadBalancer()
method even if the architecture of the LB policy does not require a
hashing load balancer. I think we would also benefit from disentangling
ThreadAwareLoadBalancerBase from LoadBalancerBase, as the former never
actually does any host picking.

- as we convert existing thread-local load balancers to thread-aware
load balancers, new local LBs will be re-created upon membership
changes. We should provide a mechanism allowing load balancers to
control whether this rebuild should occur, e.g. a callback that calls
create() for thread-aware load balancers by default, which can be
overridden to do nothing for thread-local LBs.

Signed-off-by: Eugene Chan <eugenechan@google.com>
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
envoyproxy/api-shepherds assignee is @lizan
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #17400 was opened by pianiststickman.

see: more, trace.

@pianiststickman
Copy link
Contributor Author

/assign markdroth
/assign htuch
/assign mattklein123

Signed-off-by: Eugene Chan <eugenechan@google.com>
Copy link
Member

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm api

Signed-off-by: Eugene Chan <eugenechan@google.com>
@markdroth
Copy link
Contributor

I'll let those who are more familiar with the Envoy code review this in detail, but at a high level, the functionality describe here looks great, and the API changes look good as well. Thanks for doing this!

@mattklein123
Copy link
Member

@lizan @htuch who is on the hook here for primary review of the change?

@htuch
Copy link
Member

htuch commented Jul 29, 2021

Probably makes sense for @lizan as a non-Googler for multi org policy reasons.

@markdroth
Copy link
Contributor

@snowp may want to review as well.

@mattklein123 mattklein123 removed their assignment Jul 29, 2021
@mattklein123
Copy link
Member

OK I will unassign myself. Let me know if you want me to look at anything in particular.

api/envoy/config/cluster/v3/cluster.proto Show resolved Hide resolved
source/common/upstream/cluster_manager_impl.cc Outdated Show resolved Hide resolved
Comment on lines 821 to 833
const auto& lb_policy =
std::find_if(config.load_balancing_policy().policies().begin(),
config.load_balancing_policy().policies().end(),
[](envoy::config::cluster::v3::LoadBalancingPolicy_Policy policy) {
return Registry::FactoryRegistry<TypedLoadBalancerFactory>::getFactory(
policy.name()) != nullptr;
});

if (lb_policy == config.load_balancing_policy().policies().end()) {
throw EnvoyException(fmt::format(
"Didn't find a registered load balancer factory implementation for cluster: '{}'",
name_));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not needed when we use getAndCheckFactoryByName appropriately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the cluster manager should be responsible for picking the LB policy out of load_balancing_policies. IMO it's better if the cluster info does this instead - then it knows which policy was picked, and doesn't have to carry around config for any policies that weren't picked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we want to do a) check whether the lb policy factory registered , and b) create lb policy from factory in two different places? That doesn't make sense either.

You can pick LB policy in ClusterManagerImpl and store a reference to the lb policy factory in a member in ClusterInfoImpl, and just use that in cluster manager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm missing an important detail somewhere... I'm under the impression that we can't store the factory we've picked from the ClusterManagerImpl, since info() returns a const pointer. By the time the cluster manager picks the LB policy it's too late to write what we picked into the ClusterInfoImpl. That's why I was hoping to pick the policy in the ClusterInfoImpl constructor instead.

This is mostly a result of wanting to be able to write a unit test that verifies that the set of policies { 1. policy with no registered factory, 2. policy with a registered factory } will result in the known policy being picked. It seems to me this requires that the ClusterInfoImpl to know which policy was picked, and hence to do the picking itself. I suppose this problem disappears if we don't bother with that bit of the test but it seems to me like a reasonably important behaviour to ensure is correct here.

Arguably a) is not so much about checking whether the factory is registered as much as it is about actually picking the policy - which happens to guarantee as a postcondition that the chosen factory is registered by definition. So I'm not how problematic it is to have policy picking in the ClusterInfoImpl constructor and the actual LB instantiation in the cluster manager instead. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm missing an important detail somewhere... I'm under the impression that we can't store the factory we've picked from the ClusterManagerImpl, since info() returns a const pointer. By the time the cluster manager picks the LB policy it's too late to write what we picked into the ClusterInfoImpl. That's why I was hoping to pick the policy in the ClusterInfoImpl constructor instead.

I meant you can implement lb_policy_factory() in ClusterInfoImpl to return the reference to the factory. My point here is to reduce manual check config and throw EnvoyException to make the error message more consistent and easy to evolve extension API in general, that's why I pointed to just use TypedExtensionConfig in proto part as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

test/common/upstream/cluster_manager_impl_test.cc Outdated Show resolved Hide resolved
Signed-off-by: Eugene Chan <eugenechan@google.com>
Removes LoadBalancingPolicy.Policy type from v4alpha. Changes the type of an unimplemented field.

Signed-off-by: Eugene Chan <eugenechan@google.com>
Signed-off-by: Eugene Chan <eugenechan@google.com>
Comment on lines 1097 to 1099
// Deprecated and replaced by TypedExtensionConfig.
message Policy {
option deprecated = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this referred anywhere? If no just delete?

Copy link
Contributor Author

@pianiststickman pianiststickman Aug 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having trouble getting the presubmits to pass when I just deleted, but I didn't realize I needed to deprecate the unused v2 message as well. Done (for the parts of Policy that are removed).

Comment on lines 821 to 833
const auto& lb_policy =
std::find_if(config.load_balancing_policy().policies().begin(),
config.load_balancing_policy().policies().end(),
[](envoy::config::cluster::v3::LoadBalancingPolicy_Policy policy) {
return Registry::FactoryRegistry<TypedLoadBalancerFactory>::getFactory(
policy.name()) != nullptr;
});

if (lb_policy == config.load_balancing_policy().policies().end()) {
throw EnvoyException(fmt::format(
"Didn't find a registered load balancer factory implementation for cluster: '{}'",
name_));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm missing an important detail somewhere... I'm under the impression that we can't store the factory we've picked from the ClusterManagerImpl, since info() returns a const pointer. By the time the cluster manager picks the LB policy it's too late to write what we picked into the ClusterInfoImpl. That's why I was hoping to pick the policy in the ClusterInfoImpl constructor instead.

I meant you can implement lb_policy_factory() in ClusterInfoImpl to return the reference to the factory. My point here is to reduce manual check config and throw EnvoyException to make the error message more consistent and easy to evolve extension API in general, that's why I pointed to just use TypedExtensionConfig in proto part as well.

api/envoy/config/cluster/v3/cluster.proto Outdated Show resolved Hide resolved
config.load_balancing_policy().policies().end(),
[](const envoy::config::core::v3::TypedExtensionConfig& policy) {
return Registry::FactoryRegistry<TypedLoadBalancerFactory>::getFactory(
policy.name()) != nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer use name in extension config to identify which factory should be used, it is determined by the type_url in typed_config, so this you should use Config::Utility::getAndCheckFactory() and pass the whole message instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks.

@@ -1095,7 +1094,9 @@ message Cluster {
message LoadBalancingPolicy {
option (udpa.annotations.versioning).previous_message_type = "envoy.api.v2.LoadBalancingPolicy";

// Deprecated and replaced by TypedExtensionConfig.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of deprecating this whole message, let's deprecate the name and typed_config fields and add the new TypedExtensionConfig field inside of this message. That way, if we need to add a new field on a per-policy basis in the future, we'll have somewhere to put it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to be an anti-pattern, what field do we expect? In general a per-policy field should be added inside the message packed into Any.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a field only for one individual policy, that would make sense. But if we discover the need to add another field that we want to be set for all policies, then this wrapper would provide a place to put that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we put that into CommonLbConfig then, and include the message in the policy's message? Or is the idea that policies would need to set this field whether they use the common LB config or not?

The verbosity of load_balancing_policy.policy.typed_extension_config.typed_config.foo seems a bit excessive. Is it something we could change if/when we actually have such a common field?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all policies will include CommonLbConfig in their config, and the idea here is to have a way to add the field for any policy -- e.g., if we wanted to add something to express the control flow of how the client picks which policy to use.

We could make this change later, but it would be a breaking change at that point. The idea here is to future-proof a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, done.

Signed-off-by: Eugene Chan <eugenechan@google.com>
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy[\w/]*/(v1alpha\d?|v1|v2alpha\d?|v2))|(api/envoy/type/(matcher/)?\w+.proto).

🐱

Caused by: #17400 was synchronize by pianiststickman.

see: more, trace.

@pianiststickman
Copy link
Contributor Author

PTAL, thanks!

@@ -839,13 +839,13 @@ message Cluster {
message LoadBalancingPolicy {
message Policy {
// Required. The name of the LB policy.
string name = 1;
string name = 1 [deprecated = true];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be deprecated in v2. It doesn't really make sense to do this unless we're also adding the new field to v2, and I don't see any need for that.

Instead, these fields should be kept in v3 and marked as deprecated there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The API changes look great here, but I'll let Lizan approve.

Signed-off-by: Eugene Chan <eugenechan@google.com>
Copy link
Member

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this is on the right track.

Comment on lines 1106 to 1109
string name = 1 [deprecated = true, (envoy.annotations.deprecated_at_minor_version) = "3.0"];

google.protobuf.Any typed_config = 3;
google.protobuf.Any typed_config = 3
[deprecated = true, (envoy.annotations.deprecated_at_minor_version) = "3.0"];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we can delete them instead of deprecating since the whole message was not-implemented-hide before, and the implementation doesn't support this either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be okay with fully removing these fields too, unless that causes problems. But I think there's very little cost to leaving them here and marking them deprecated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I tried to remove the whole Policy message from v3 only earlier and it failed the presubmit because it was expecting some v3 message to correspond to the v2 version of the same message. (That's why I ended up deprecating v2 fields in an earlier commit, which I've since undone.)

Removing these fields seems to be fine, though, so I've done that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. While there's little cost to leaving them here, but then the doc will be confusing as we never use those field in implementation.

Signed-off-by: Eugene Chan <eugenechan@google.com>
Copy link
Member

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17400 (review) was submitted by @lizan.

see: more, trace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Load balancer extensibility
6 participants