Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: do not compare all svc.spec for user modified scene #1342

Merged

Conversation

spwangxp
Copy link
Contributor

What type of PR is this?

fix: #1340

What this PR does / why we need it:
if not, the ir will encounter an error when update svc.
for example, user A deployed an envoyproxy, and set the svc type to NodePort instead of LoadBalancer, then the gateway restart, the check will always be error.

Which issue(s) this PR fixes:

Fixes #1340

@spwangxp spwangxp requested a review from a team as a code owner April 23, 2023 07:02
@spwangxp spwangxp force-pushed the bugfix-1340-update-svc-after-gateway-restart branch from d163cca to 1ddf9d3 Compare April 23, 2023 08:46
@codecov
Copy link

codecov bot commented Apr 23, 2023

Codecov Report

Merging #1342 (8f2695c) into main (297fcf1) will increase coverage by 0.11%.
The diff coverage is 70.00%.

@@            Coverage Diff             @@
##             main    #1342      +/-   ##
==========================================
+ Coverage   62.44%   62.55%   +0.11%     
==========================================
  Files          79       79              
  Lines       11084    11088       +4     
==========================================
+ Hits         6921     6936      +15     
+ Misses       3707     3697      -10     
+ Partials      456      455       -1     
Impacted Files Coverage Δ
...ternal/infrastructure/kubernetes/infra_resource.go 83.69% <40.00%> (ø)
internal/infrastructure/kubernetes/infra_client.go 75.00% <100.00%> (ø)
...nal/infrastructure/kubernetes/resource/resource.go 100.00% <100.00%> (ø)

... and 2 files with indirect coverage changes

@spwangxp spwangxp force-pushed the bugfix-1340-update-svc-after-gateway-restart branch 2 times, most recently from 7ea2c83 to 79b9cf8 Compare April 23, 2023 10:12
@spwangxp spwangxp force-pushed the bugfix-1340-update-svc-after-gateway-restart branch from 79b9cf8 to 4cbb1a0 Compare April 24, 2023 01:43
Copy link
Contributor

@zirain zirain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

cc @arkodg for second review

@qicz
Copy link
Member

qicz commented Apr 24, 2023

#1337 follow-up infrastructure refactoring may be the best choice to follow up this pr. @spwangxp

@@ -122,7 +124,7 @@ func (i *Instance) CreateOrUpdateService(ctx context.Context, svc *corev1.Servic
}
} else {
// Update if current value is different.
if !reflect.DeepEqual(svc.Spec, current.Spec) {
if !utils.CompareSvc(svc, current) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are two bugs associated with this logic

  1. we are comparing the entire Spec here, instead we should be comparing all the fields the infra layer sets or skip the fields the infra layer does not set https://pkg.go.dev/k8s.io/api/core/v1#ServiceSpec.

  2. when updating the service, we need to set immutable fields like ClusterIP and ClusterIPs

// clusterIP is the IP address of the service and is usually assigned
	// randomly. If an address is specified manually, is in-range (as per
	// system configuration), and is not in use, it will be allocated to the
	// service; otherwise creation of the service will fail. This field may not
	// be changed through updates unless the type field is also being changed
	// to ExternalName (which requires this field to be blank) or the type
	// field is being changed from ExternalName (in which case this field may
	// optionally be specified, as describe above).  Valid values are "None",
	// empty string (""), or a valid IP address. Setting this to "None" makes a
	// "headless service" (no virtual IP), which is useful when direct endpoint
	// connections are preferred and proxying is not required.  Only applies to
	// types ClusterIP, NodePort, and LoadBalancer. If this field is specified
	// when creating a Service of type ExternalName, creation will fail. This
	// field will be wiped when updating a Service to type ExternalName.
	// More info: https://kubernetes.io/docs/concepts/services-networking/service/#virtual-ips-and-service-proxies
	// +optional
	ClusterIP [string](https://pkg.go.dev/builtin#string) `json:"clusterIP,omitempty" protobuf:"bytes,3,opt,name=clusterIP"`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1, ok, i'll add imutable fields compare we set.
2, when updating the svc, is these be better that just patch what we need rather than set everything to the new svc?
3, actually,when we use loadbalacer by default, svc.Spec.ports[*].nodePort will be set an random number by k8s, so it's not just about user modified the svc type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reg 2. Patch is fine, my only request is to keep the code simple so it can be enhanced with more fields in the future
Reg 3. its okay to undo what user did, to solve this, we must

  • support Nodeport natively
  • support BYO Envoy Svc and Envoy Deployment and disable managed infra

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise another pr for support NodePort natively might be a better way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure please go ahead @spwangxp


// CompareSvc Only compare the selector and ports(not include nodePort) in case user have modified for some scene.
func CompareSvc(newSvc, originalSvc *corev1.Service) bool {
return cmp.Equal(newSvc.Spec.Selector, originalSvc.Spec.Selector) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the infra layer also sets Type and SessionAffinity

func ExpectedServiceSpec(serviceType *egcfgv1a1.ServiceType) corev1.ServiceSpec {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something for the future, we should find a sane way to ensure we check what we set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any need to update svc type rather than follow user's setting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some comments above, Patch is fine as long as its easy to mantain

@arkodg
Copy link
Contributor

arkodg commented Apr 24, 2023

thanks for highlighting this bug @spwangxp, added some comments

@arkodg
Copy link
Contributor

arkodg commented Apr 24, 2023

also @spwangxp if you are using service type NodePort, can you please raise a issue to add support for it ?

@spwangxp spwangxp force-pushed the bugfix-1340-update-svc-after-gateway-restart branch from f2595e9 to aae2408 Compare April 26, 2023 06:00
@arkodg
Copy link
Contributor

arkodg commented Apr 26, 2023

@qicz can you take a first pass ? with the recent refactor want to make sure the logic lives in the right place, TIA

Signed-off-by: spwangxp <wangshengpeng@cestc.cn>
@spwangxp spwangxp force-pushed the bugfix-1340-update-svc-after-gateway-restart branch from aae2408 to 87488d9 Compare April 27, 2023 02:23
qicz
qicz previously approved these changes Apr 27, 2023
Copy link
Member

@qicz qicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pls approve ci @arkodg

@Xunzhuo
Copy link
Member

Xunzhuo commented Apr 27, 2023

I got this @qicz

qicz
qicz previously approved these changes Apr 27, 2023
@zirain zirain changed the title fix: #1340 not compare all svc.spec for user modified scene fix: do not compare all svc.spec for user modified scene Apr 27, 2023
zirain
zirain previously approved these changes Apr 27, 2023
Copy link
Contributor

@zirain zirain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arkodg for final review.

Signed-off-by: spwangxp <wangshengpeng@cestc.cn>
@spwangxp spwangxp dismissed stale reviews from zirain and qicz via 1500e2c April 28, 2023 02:44
arkodg
arkodg previously approved these changes Apr 28, 2023
qicz
qicz previously approved these changes Apr 28, 2023
Copy link
Member

@qicz qicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@arkodg arkodg self-requested a review April 28, 2023 16:32
@arkodg
Copy link
Contributor

arkodg commented Apr 28, 2023

/hold can we compare the entire Spec here, and ignore any fields (NodePort) . This approach will be easier to maintain in the future

I've raised two issues to track your use case @spwangxp

@spwangxp
Copy link
Contributor Author

/hold can we compare the entire Spec here, and ignore any fields (NodePort) . This approach will be easier to maintain in the future

I've raised two issues to track your use case @spwangxp

1, actual gw just need ensure the field unchanged we set before.
2, field should exclude more than need compare, compare entire spec is harder to maintain.

@arkodg
Copy link
Contributor

arkodg commented Apr 29, 2023

/hold can we compare the entire Spec here, and ignore any fields (NodePort) . This approach will be easier to maintain in the future
I've raised two issues to track your use case @spwangxp

1, actual gw just need ensure the field unchanged we set before. 2, field should exclude more than need compare, compare entire spec is harder to maintain.

every time an API change is made to the KubernetesServiceSpec, the reviewer/maintainer will need to ensure that the comparison field is added in the infra layer which is extra cognitive load for the reviewer, else it will cause a bug
Moving the compare to Spec is simpler to maintain long term

@spwangxp
Copy link
Contributor Author

/hold can we compare the entire Spec here, and ignore any fields (NodePort) . This approach will be easier to maintain in the future
I've raised two issues to track your use case @spwangxp

1, actual gw just need ensure the field unchanged we set before. 2, field should exclude more than need compare, compare entire spec is harder to maintain.

every time an API change is made to the KubernetesServiceSpec, the reviewer/maintainer will need to ensure that the comparison field is added in the infra layer which is extra cognitive load for the reviewer, else it will cause a bug Moving the compare to Spec is simpler to maintain long term

check and remove ignored field should de done when KubernetesServiceSpec changed, so there is no difference.
or u hava a ignore field list already?

@arkodg
Copy link
Contributor

arkodg commented Apr 29, 2023

The ignore list shouldnt change in the codebase, but should only only increase whenever upstream K8s API adds a dynamic field into the Spec, and shouldnt change whenever a contributor updates the internal KubernetesServiceSpec struct in this project

@spwangxp
Copy link
Contributor Author

1, add some comment on KubernetesServiceSpec to notice that.
2, pls help check the below list, will use cmp.Equal(newSvc.Spec, oldSvc.Spec, cmpopts.IgnoreFields(ignore list)) to compare two svc.
ignore list:

  • Ports.NodePort
  • ClusterIP
  • ClusterIPs
  • ExternalIPs
  • SessionAffinity
  • LoadBalancerIP
  • LoadBalancerSourceRanges
  • ExternalTrafficPolicy
  • HealthCheckNodePort
  • SessionAffinityConfig
  • IPFamilies
  • IPFamilyPolicy
  • AllocateLoadBalancerNodePorts
  • LoadBalancerClass
  • InternalTrafficPolicy

check list:

  • Ports
  • Selector
  • Type
  • ExternalName
  • PublishNotReadyAddresses

The ignore list shouldnt change in the codebase, but should only only increase whenever upstream K8s API adds a dynamic field into the Spec, and shouldnt change whenever a contributor updates the internal KubernetesServiceSpec struct in this project

@arkodg
Copy link
Contributor

arkodg commented May 1, 2023

afaik we should only ignore

Ports.NodePort
ClusterIP
ClusterIPs

since they are generated by K8s
LoadBalancerIP has been deprecated, so we can ignore it

@Alice-Lilith
Copy link
Member

Alice-Lilith commented May 2, 2023

/hold can we compare the entire Spec here, and ignore any fields (NodePort) . This approach will be easier to maintain in the future

afaik we should only ignore

Ports.NodePort
ClusterIP
ClusterIPs

since they are generated by K8s LoadBalancerIP has been deprecated, so we can ignore it

+1 since this makes it easy to maintain / not need to remember this, otherwise lgtm

Signed-off-by: spwangxp <wangshengpeng@cestc.cn>
@spwangxp spwangxp dismissed stale reviews from qicz and arkodg via 982272f May 4, 2023 02:23
@qicz
Copy link
Member

qicz commented May 4, 2023

pls @Xunzhuo @zirain @arkodg approve the ci

@Xunzhuo
Copy link
Member

Xunzhuo commented May 4, 2023

done

@Xunzhuo Xunzhuo enabled auto-merge (squash) May 4, 2023 07:07
@Xunzhuo Xunzhuo disabled auto-merge May 4, 2023 07:07
Xunzhuo
Xunzhuo previously approved these changes May 4, 2023
Copy link
Member

@Xunzhuo Xunzhuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Thanks for working on it @spwangxp

Signed-off-by: spwangxp <wangshengpeng@cestc.cn>
@qicz
Copy link
Member

qicz commented May 5, 2023

ci needs to retrigger

Copy link
Contributor

@arkodg arkodg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for considering all the suggestions !

@arkodg arkodg merged commit ea2a688 into envoyproxy:main May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

envoyproxy instance svc update error after gateway restart
6 participants