Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate IstioControlPlaneSpec with MeshConfig #1101

Closed
wants to merge 6 commits into from

Conversation

ostromart
Copy link
Contributor

First pass for integrating these two protos per the integration proposal doc.
Once there's basic agreement, there will be follow-on work items:

  1. Translate between v1alpha1<->v1alpha2 MeshConfig. Required for backwards compatibility with existing components that expect v1alpha1, and in the other direction for user migration to v1alpha2.
  2. Translate from v1alpha2->values.yaml.
  3. Port of the remainder of values.yaml into v1alpha2.

@ostromart ostromart requested a review from a team as a code owner September 24, 2019 21:22
@googlebot googlebot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Sep 24, 2019
@istio-testing istio-testing added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Sep 24, 2019
mesh/v1alpha2/config.proto Outdated Show resolved Hide resolved
mesh/v1alpha2/config.proto Outdated Show resolved Hide resolved
mesh/v1alpha2/config.proto Outdated Show resolved Hide resolved
mesh/v1alpha2/config.proto Outdated Show resolved Hide resolved

// Ingress/egress gateway configuration.
IngressGatewayComponentSpec ingress_gateway = 31;
EgressGatewayComponentSpec egress_gateway = 32;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we support multiple gateways? Should we make this a repeated field?

This may apply to other components as well in the future for canary, etc, but as far as I know people are only really running multiple gateways

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gateways are likely to be user config so it's not clear if this needs to cover all the use cases. For now it's there to support demo/out of the box installs but for serious use, users would turn these off and create their own.
I'd argue for leaving it as is, otherwise this may get very complex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will the operator not support users configuring their own gateways?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They would do it outside of the operator, basically it would be treated like all other Istio user config. The reason is that it has been a struggle to create a sufficiently broad definition for gateway to accommodate all cases. But if the existing Ingress/Egress specs above are sufficient to cover most user cases, we can make these a repeated field and call it a day.

Copy link
Contributor

@ayj ayj Sep 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeating these as-is results in multiple gateway LBs with identical config (e.g. ports for for multi-cluster passthrough and VM expansion). We probably want two different things:

  1. A default set of gateway(s) with appropriate Istio config to satisfy the MeshConfig contract. And some way to disable these defaults.

  2. A method for users to create their own Istio gateway LBs.

I believe (2) is technically possible using helm by disabling all components except the gateway. I don't think that is appropriate here. We need to differentiate default from user gateways if they both exist in MeshConfig. Alternatively, we add another API for creating gateway LB instances. In either case, the operator could manage the gateway lifecycle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the gateway question requires significantly more work on both the API and resulting operator code. Given the short amount of time for 1.4, I propose we defer this work till 1.5.
Since we still have the helm passthrough in this API, we could just remove the gateway component completely for this release and ask users to use the helm API for gateway components.

mesh/v1alpha2/config.proto Outdated Show resolved Hide resolved
mesh/v1alpha2/config.proto Outdated Show resolved Hide resolved
mesh/v1alpha2/config.proto Outdated Show resolved Hide resolved
mesh/v1alpha2/config.proto Outdated Show resolved Hide resolved
mesh/v1alpha2/config.proto Outdated Show resolved Hide resolved
@rshriram
Copy link
Member

90% of the stuff in Mesh config is for Pilot's consumption only. Even if its authN or mixer report server or what not. Given that, I find it confusing to have settings used by pilot be present under citadel, combined with helm chart stuff for citadel.

All in all, I am not convinced that this is the right approach to reconciling values.yaml with mesh config. We should move stuff from values into meshConfig if it doesn't exist. And then we can simply have a ComponentSpec that captures the settings that we currently have in pilot/citadel/gateway/telemetry, etc. IF we go down this path, we wont have to have a v1alpha2 of MeshConfig. We just need a k8s.proto and the componentSpec.proto

@ostromart
Copy link
Contributor Author

@rshriram the fact that most of the parameters are consumed by pilot is an implementation detail that most users don't know or care about. The issue here is that we have a huge, totally flat structure, which is only going to get bigger as we migrate values.yaml parameters into it. This is one way to impose some structure on it that follows relatively well established feature groupings.
I agree there's a migration cost involved but if we want to take the pain this is the right moment to do it imo.

mesh/v1alpha2/config.proto Outdated Show resolved Hide resolved
mesh/v1alpha2/config.proto Outdated Show resolved Hide resolved
mesh/v1alpha2/config.proto Outdated Show resolved Hide resolved
mesh/v1alpha2/config.proto Outdated Show resolved Hide resolved
mesh/v1alpha2/config.proto Outdated Show resolved Hide resolved
mesh/v1alpha2/config.proto Outdated Show resolved Hide resolved
mesh/v1alpha2/config.proto Outdated Show resolved Hide resolved
@ostromart
Copy link
Contributor Author

@linsun what we have here is the first step, not the final product. The final product will be:

  • values.yaml will be deprecated (it can continue to be supported outside of istio/api as long as we have helm users)
  • any settings that we want from values.yaml that are not already in here will be migrated gradually into MeshConfig. Since we already cover all the k8s related settings in IstioControlSpec (which probably needs a different name now) and the really important runtime config is in MeshConfig, I'm not sure what that actually leaves.

For 1.4, there will indeed be multiple paths to access some parameters (through MeshConfig and values.yaml field embedded in it). At least I don't see how we can do away with values.yaml at this point without migrating a bunch of config knobs. Some of these are fairly easy but some (like user defined gateways) are quite hard.

}

// Configuration options for the proxy.
message ProxyComponentSpec {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proxy settings are already expressed in a structure - no need to create a parallel one, you can add it there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which structure?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

istio.mesh.v1alpha1.ProxyConfig

Available as the field "default_config", unfortunate choice of name.

It does seem easy to just throw the k8s resource spec in there.


// Configuration options for the telemetry component.
message TelemetryComponentSpec {
google.protobuf.BoolValue enabled = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about: for telemetry and policy just use the value of mixerAddress (already existing) in 1.4. This is how users are trained to control telemetry, no need to add an API just to remove it soon after.

Besides - telemetry will remain enabled even if mixer is not running. This is clearly not correct - users still get most of the telemetry from envoy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We maybe should add Mixer to these names to make it clear these are configs for MixerTelemetry, not Telemetry overall.

}

// Configuration options for Citadel component.
message CitadelComponentSpec {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as with the other - I would run it by security WG, but I think the proper way would be to have a "security" section at top level, and maybe a provider: with citadel as an option, or the non-citadel CA provider URL or other related settings. There is a proposal to add trust aliases as well, sds may migrate there as well.

Meanwhile - if users want to disable citadel in 1.3 they can use the values.yaml override.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, but I feel like this is something we should tackle incrementally rather than this PR, since it's a real API change. The original changes I made were a purely mechanical reshuffling of parameters into groups, your proposal has some actual meat in it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems totally reasonable to have a top-level "citadel" field in the spec and configure the component. That should hold citadel-specific config, and a "security" section could have security-specific config (maybe you nest citadel under security if we feel the need to).

But generally I agree on holding off a major refactor of mesh config in this PR.

// Configuration options for galley component.
message GalleyComponentSpec {
google.protobuf.BoolValue enabled = 1;
KubernetesResourcesSpec k8s = 80;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as in the other changes - please use the feature or behavior, not component/install name.

mcp_server ?

There is already a section on mcp addresses - I suspect we can just check if istio-galley.istio-system is on the list to
determine if galley should run, do maybe no need to add this redundant section ?

If you just need a place to attach k8s - I mentioned using a map<string,k8sresourceSpec> - keyed by component name. Or keeping them in a separate file, to not make the mesh config dependent on internal code names.
( see the discussion in Chiron in env wg)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're conflating two things here: "feature" settings and "component" settings. I had feature structs in the first attempt to integrate the protos but I've since taken them out, since there's nothing left there anymore. The feature settings should not have any internal names.
The component settings necessarily name internal components because they map resources 1:1 with those components. Again I think this boils down to listing the components vs. your map proposal, let's continue in that thread.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to being explicit about the difference between configuration of a feature (such as whether config validation is on) and configuration of a component (such as whether we run galley, and if so how it is run). These are different concerns and should remain separate.


// Configuration options for ingress gateways.
message IngressGatewayComponentSpec {
google.protobuf.BoolValue enabled = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting because Istio supports multiple ingress gateways. Same for egress.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gateways anyway require values.yaml API to be used. This just allows the built-in gateways' resources to be controlled through this new API. User gateways can still be defined, but their resources must be set using values.yaml (for now).

repeated k8s.io.api.core.v1.Toleration tolerations = 14;

// Overlays for k8s resources in rendered manifests.
repeated k8sObjectOverlay overlays = 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is also something to keep for 1.5 ? Little time, too much to test ?

Copy link
Contributor

@costinm costinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to be clear on one thing: for 1.4 Operator should work just fine without this PR, using a separate values.yaml in the current format.

So we are not forced to merge everything in 1.4 or quickly - and remove it later, we can make sure each
setting makes sense and involve the WG responsible for the feature to get the right API.

@linsun
Copy link
Member

linsun commented Oct 3, 2019

@costinm thanks, I think I agree with that assessment for 1.4

@ostromart re your comment, This is exactly what I'm confused...."any settings that we want from values.yaml that are not already in here will be migrated gradually into MeshConfig. Since we already cover all the k8s related settings in IstioControlSpec (which probably needs a different name now) and the really important runtime config is in MeshConfig, I'm not sure what that actually leaves." Why are we moving everything from values.yaml to mesh config, vs just moving things from values.yaml but not covered in IstioControlPlaneSpec to MeshConfig?

@sdake
Copy link
Member

sdake commented Oct 3, 2019

I am a little confused as well. I thought the agreement was to have runtime configurable options go to MeshConfig, and have install time configurable options go to IstioControlPlane. It may be that IstiioControlPlane additionally contains runtime configuration options (via values.yaml) until they are migrated to MeshConfig - not IstioControlPlane.

This raises the question of how does the operator's controller manage runtime operations...

Copy link
Member

@rshriram rshriram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ostromart I like this model. I was thinking that we could [in future] also categorize the mesh config..

MeshConfig:
 NetworkingConfig
 TelemetryConfig
 SecurityConfig
 ComponentSpecs: .. whatever you have currently under install

where the existing settings can be moved under these subsections. And

@rshriram
Copy link
Member

rshriram commented Oct 3, 2019

LGTM after all the comments from others are addressed.

@ostromart
Copy link
Contributor Author

@costinm @linsun the operator will not work fine without this PR. This PR represents the minimal amount of integration we need for operator to use the converged MeshConfig that we agreed to.
What I don't want this PR to become is a complete overhaul of the values.yaml API. That's not what we agreed to and such a PR will never complete in the forseeable future.
I propose we agree to a minimum integration - which I think this PR is now very close to - and leave the bigger task of migrating values.yaml incrementally since that involves different WGs and is better handled through smaller, incremental PRs rather than a single giant one.

@ostromart
Copy link
Contributor Author

@linsun sorry, that's exactly what I meant - we will move any fields in values.yaml that are not already in IstioControlPlaneSpec into MeshConfig, unless we decide to deprecate them instead.

@ostromart
Copy link
Contributor Author

@sdake that's correct - values.yaml config that covers "install time" (k8s resources, HPA etc) is moving into IstioControlPlaneSpec in this PR. Going forward, only the fields in values.yaml not already in MeshConfig or IstioControlPlaneSpec will be considered for migration into MeshConfig or deprecated. Finally, we will only have MeshConfig (for runtime) and ICPS (for install resources).
Operator will use at least ICPS, although it's possible for it to act on changes in runtime config if there's a need to do so.

Copy link
Member

@linsun linsun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ostromart thank you for the important clarification!

@ostromart ostromart added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Oct 7, 2019
@istio-testing
Copy link
Collaborator

@ostromart: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
gencheck_api d0a6e0d link /test gencheck_api
build_api d0a6e0d link /test build_api

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Contributor

@smawson smawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to split this pr and just have this be changes to the install spec (which should show up as a move instead of a new file, or is it not in this repo?).

Then we can remove the profile and values stuff from it, and just have it be a component spec.

@@ -483,3 +487,19 @@ message LocalityLoadBalancerSetting{
// Note: if no OutlierDetection specified, this will not take effect.
repeated Failover failover = 2;
}

// Configuration options for traffic management.
message TrafficManagementFeatureSpec {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than creating empty messages can we move a few fields into each of these (or rather copy them over to do the migration)?

Or if we don't want to do that yet maybe leave MeshConfig alone and have this PR just clean up control plane spec?

import "k8s.io/apimachinery/pkg/apis/meta/v1/generated.proto";
import "github.com/gogo/protobuf/protobuf/google/protobuf/wrappers.proto";

package istio.mesh.v1alpha1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should rename this proto to components.proto or controlplane.proto or something.

/*
IstioControlPlaneSpec is a schema for both defining and customizing Istio control plane resources.

The simplest specialization is to point the user IstioControlPlaneSpec profile to a different values file, for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different profile? We maybe don't want to talk about values since those will go away?

@ozevren ozevren removed their request for review October 9, 2019 17:57
@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Nov 6, 2019
@ostromart
Copy link
Contributor Author

not stale, just sleeping...

@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Nov 8, 2019
@ostromart
Copy link
Contributor Author

Continued in #1177.

@ostromart ostromart closed this Nov 20, 2019
@ostromart ostromart deleted the icp-integrate branch February 28, 2020 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. needs-rebase Indicates a PR needs to be rebased before being merged size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.