Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite clean-nginx-conf.sh in Go to speed up admission webhook #7076

Conversation

cgorbit
Copy link
Contributor

@cgorbit cgorbit commented Apr 25, 2021

What this PR does / why we need it:

Current implementation of validating web hook may be too slow in some
circumstances because it use Shell-script to tidy generated full nginx config.
On my dev-cluster with lots of namespaces/ingresses this sometimes lead too (in apiserver):

Error: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://nginx-internal-grpc-ingress-nginx-controller-admission.ingress-nginx.svc:443/networking/v1beta1/ingresses?timeout=30s: context deadline exceeded

web hook:

func (ia *IngressAdmission) HandleAdmission(obj runtime.Object) (runtime.Object, error) {

call to shell-script:
cmd := exec.Command("/ingress-controller/clean-nginx-conf.sh")

shell-script: https://github.com/kubernetes/ingress-nginx/blob/f5cfd5730c4b296c87fbc531c83a6e4f33483b75/rootfs/ingress-controller/clean-nginx-conf.sh

I rewrite tidy-code in Golang in this PR to speed it up.

My kubernetes in AWS:

bash-5.0$ time /ingress-controller/clean-nginx-conf.sh < nginx-big.conf > result.old
real	0m3.696s
user	0m4.519s
sys	0m0.868s

bash-5.0$ time ./indent < nginx-big.conf > result.new

real	0m0.127s
user	0m0.120s
sys	0m0.008s

bash-5.0$ diff -u result.old result.new
--- result.old	2021-04-25 14:46:57.549149609 +0000
+++ result.new	2021-04-25 14:47:21.885195144 +0000
@@ -44,7 +44,7 @@
 			use_proxy_protocol = false,
 			is_ssl_passthrough_enabled = false,
 			http_redirect_code = 308,
-		listen_ports = { ssl_proxy = "442", https = "443" },
+			listen_ports = { ssl_proxy = "442", https = "443" },

 			hsts = true,
 			hsts_max_age = 15724800,
@@ -89,7 +89,7 @@
 		plugins = res
 		end
 		-- load all plugins that'll be used here
-	plugins.init({  })
+		plugins.init({  })
 	}

 	init_worker_by_lua_block {

bash-5.0$

On localhost (not so impressive):

17:57:23 cgorbit@cgorbit:[0]~/go/src/k8s.io/ingress-nginx$ time rootfs/ingress-controller/clean-nginx-conf.sh < ~/nginx-big.conf > result.old

real	0m0,328s
user	0m0,730s
sys	0m0,074s
17:57:29 cgorbit@cgorbit:[0]~/go/src/k8s.io/ingress-nginx$ time ./indent < ~/nginx-big.conf > result.new

real	0m0,132s
user	0m0,116s
sys	0m0,017s
17:57:31 cgorbit@cgorbit:[0]~/go/src/k8s.io/ingress-nginx$ diff -u result.old result.new
--- result.old	2021-04-25 17:57:24.120927547 +0300
+++ result.new	2021-04-25 17:57:31.020945713 +0300
@@ -44,7 +44,7 @@
 			use_proxy_protocol = false,
 			is_ssl_passthrough_enabled = false,
 			http_redirect_code = 308,
-		listen_ports = { ssl_proxy = "442", https = "443" },
+			listen_ports = { ssl_proxy = "442", https = "443" },

 			hsts = true,
 			hsts_max_age = 15724800,
@@ -89,7 +89,7 @@
 		plugins = res
 		end
 		-- load all plugins that'll be used here
-	plugins.init({  })
+		plugins.init({  })
 	}

 	init_worker_by_lua_block {

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

This is performance improvement.

Which issue/s this PR fixes

No related issues

How Has This Been Tested?

  1. I compare result of original clean-nginx-conf.sh and my new code on 11MB nginx.conf
    The only diff is: https://gist.github.com/cgorbit/c0c9d6442dc18d44113d685e2e6bce27
    Which is improvement, I think.
  2. make test: https://gist.github.com/cgorbit/bd302d9fdd55a93d509427981f9ee490
  3. make lua-test
$ make lua-test
nginx: [alert] could not open error log file: open() "/var/log/nginx/error.log" failed (13: Permission denied)
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
163 successes / 0 failures / 0 errors / 0 pending : 0.088464 seconds
  1. make kind-e2e-test
    https://gist.github.com/cgorbit/b6f2db83275c8c996f5bb5b4de668820

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I've read the CONTRIBUTION guide
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 25, 2021
@k8s-ci-robot
Copy link
Contributor

Welcome @cgorbit!

It looks like this is your first PR to kubernetes/ingress-nginx 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/ingress-nginx has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @cgorbit. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 25, 2021
@cgorbit
Copy link
Contributor Author

cgorbit commented Apr 27, 2021

ping @bowei ?

@cgorbit
Copy link
Contributor Author

cgorbit commented Apr 27, 2021

/assign @bowei

@cgorbit
Copy link
Contributor Author

cgorbit commented Apr 27, 2021

Sorry if I misunderstood instruction and make early assign

@rikatz
Copy link
Contributor

rikatz commented May 12, 2021

/ok-to-test

Thanks for the PR. Will review during this week/end :)

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 12, 2021
@paalkr
Copy link
Contributor

paalkr commented Jun 2, 2021

@andersosthus compiled a custom build, for internal testing, with this fix on top of 0.46. In a K8s 1.18 cluster with approx 900 complex ingress objects (several hundred hosts, snippets, regex, certs etc) validation using the bash based code did not succeed within the 30 seconds limit. Now with this Go based implementation, validation takes around 4-7 seconds. So I would say this is a huge improvement. The previous implementation did not work for us due to the timeout.

@paalkr
Copy link
Contributor

paalkr commented Jun 15, 2021

Any news on when this fix will be merged? We have been running with this in production for two weeks as of now, it really is a game changes for clusters with many ingress objects and complex configurations.

@rikatz
Copy link
Contributor

rikatz commented Jun 15, 2021

Hi there,

I've been a bit busy with #7156 and this is why I'm taking some time to answer.

As soon as I finish that one I'm really willing to look at this PR, sounds really promising.

Thanks

@rikatz
Copy link
Contributor

rikatz commented Jun 20, 2021

/priority important-soon
/kind cleanup

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Jun 20, 2021
Copy link
Contributor

@rikatz rikatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cgorbit Thanks for your PR!

I must confess I'm a bit confuse by the logic inside the byte reader, so I would like to ask:

  1. Not ignoring or panic'ng errors, but instead returning them as part of the function and dealing with them in the caller.
  2. Add some comments with examples of what each switch case is doing and when it may happen
  3. As we are moving to Go, some unit test as well with known cases.

Thanks!

if err == io.EOF {
return
}
panic(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of panic, let's return the error here. Maybe the function should be cleanConf(in, out *bytes.Buffer) error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand from documentation of bytes package, It's impossible for ReadByte() to return something other than io.EOF. That's why I wrote panic there. This allows me to leave cleanConf signature without error. If I will handle errors in cleanConf they will always be nil actually according to documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it, but still I don't like the idea of something internally panic'ing. If this behavior changes in a future, we might end up with this panic doing a world stop into ingress nginx :)

It would be better to return this error and check if this is not nil, even if we know that now it will always be nil, if this changes in a future Go compilation we won't get some strange magic error :)

@@ -86,6 +92,77 @@ func NewTemplate(file string) (*Template, error) {
}, nil
}

func cleanConf(in *bytes.Buffer, out *bytes.Buffer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func cleanConf(in *bytes.Buffer, out *bytes.Buffer) {
func cleanConf(in, out *bytes.Buffer) error {

internal/ingress/controller/template/template.go Outdated Show resolved Hide resolved
internal/ingress/controller/template/template.go Outdated Show resolved Hide resolved
Copy link
Contributor Author

@cgorbit cgorbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add error handling and returning in cleanConf if you insist, but they always be nil actually.

if err == io.EOF {
return
}
panic(err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand from documentation of bytes package, It's impossible for ReadByte() to return something other than io.EOF. That's why I wrote panic there. This allows me to leave cleanConf signature without error. If I will handle errors in cleanConf they will always be nil actually according to documentation.

internal/ingress/controller/template/template.go Outdated Show resolved Hide resolved
internal/ingress/controller/template/template.go Outdated Show resolved Hide resolved
@cgorbit cgorbit requested a review from rikatz June 21, 2021 18:06
@rikatz
Copy link
Contributor

rikatz commented Jun 27, 2021

Hey @cgorbit yes I would like that we deal with the error instead of panic (as my previous comment) just to make sure that some change into future go version wont break things or cause panic in a future (even if today we know it always returns nil).

Also, this is a nit but if possible please document the behavior of the package/what exactly cleanConf is doing (tabulating, removing characters, etc). You can do this on a followup/later PR as well, but would be great to have this.

As soon as you think this is ready, ping me on Slack (my email is pretty messy...) and I will have a new look into this and hopefully merge. So we can also cherry pick this to the branch release-v1beta1 to be available in past ingress version.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 28, 2021
@rikatz
Copy link
Contributor

rikatz commented Jun 28, 2021

/label tide/merge-method-squash
@cgorbit Thanks for the changes :)
I'll wait until the CI passes to merge this.

As this is a pretty important change, after this gets merged, would you mind opening a PR/cherrypick to branch release-v1beta1 to be also inside the old v0.X releases?

Thanks

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 28, 2021
@rikatz
Copy link
Contributor

rikatz commented Jun 28, 2021

/lgtm
/approve
Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 28, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgorbit, rikatz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 28, 2021
@k8s-ci-robot k8s-ci-robot merged commit 8328b53 into kubernetes:master Jun 28, 2021
cgorbit added a commit to cgorbit/ingress-nginx that referenced this pull request Jul 6, 2021
…rnetes#7076)

* Rewrite clean-nginx-conf.sh to speed up admission webhook

* Less diff with original clean-nginx-conf.sh

* Add error handling, add documentation, add unit test

* indent code

* Don't ignore Getwd() error
k8s-ci-robot pushed a commit that referenced this pull request Jul 6, 2021
… (#7322)

* Rewrite clean-nginx-conf.sh to speed up admission webhook

* Less diff with original clean-nginx-conf.sh

* Add error handling, add documentation, add unit test

* indent code

* Don't ignore Getwd() error
rationalex pushed a commit to joomcode/ingress-nginx that referenced this pull request May 18, 2022
…rnetes#7076)

* Rewrite clean-nginx-conf.sh to speed up admission webhook

* Less diff with original clean-nginx-conf.sh

* Add error handling, add documentation, add unit test

* indent code

* Don't ignore Getwd() error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants