Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VC-35738] Stop the API server when the context is cancelled #604

Merged
merged 2 commits into from
Nov 5, 2024

Conversation

wallrj
Copy link
Member

@wallrj wallrj commented Nov 1, 2024

In #601 I broke the --one-shot feature, because I added the HTTP server to the errgroup, but didn't provide a way to stop the HTTP server after the single gather loop.

This PR introduces a simple function which starts the server and calls shutdown when the supplied context is cancelled.

Success:

$ POD_NAMESPACE=venafi go run ./ agent --one-shot --output-path out.json --api-token should-not-be-required  --agent-config-file examples/cert-manager-agent.yaml  --enable-metrics --enable-pprof --log-level 1
I1101 17:11:07.256726  478030 logs.go:158] "Preflight agent version: development ()" source="agent"
I1101 17:11:07.256927  478030 logs.go:158] "Using the Jetstack Secure API Token auth mode since --api-token was specified." source="agent"
I1101 17:11:07.256949  478030 logs.go:158] "Using deprecated Endpoint configuration. User Server instead." source="agent"
I1101 17:11:07.256966  478030 run.go:98] "Profiling endpoints enabled" logger="APIServer" addr=":8081" path="/debug/pprof"
I1101 17:11:07.257017  478030 run.go:107] "Metrics endpoints enabled" logger="APIServer" addr=":8081" path="/metrics"
I1101 17:11:07.257089  478030 run.go:116] "Healthz endpoints enabled" logger="APIServer" addr=":8081" path="/healthz"
I1101 17:11:07.257109  478030 run.go:120] "Readyz endpoints enabled" logger="APIServer" addr=":8081" path="/readyz"
I1101 17:11:07.257249  478030 run.go:439] "Starting" logger="APIServer.ListenAndServe" addr=":8081"
E1101 17:11:07.258053  478030 logs.go:156] "error messages will not show in the pod's events because the POD_NAME environment variable is empty" source="agent"
I1101 17:11:07.259342  478030 logs.go:158] "starting \"k8s/secrets.v1\" datagatherer" source="agent"
I1101 17:11:07.260262  478030 envvar.go:172] "Feature gate default state" feature="WatchListClient" enabled=false
I1101 17:11:07.260359  478030 envvar.go:172] "Feature gate default state" feature="InformerResourceVersion" enabled=false
I1101 17:11:07.561887  478030 logs.go:158] "successfully gathered 18 items from \"k8s/secrets.v1\" datagatherer" source="agent"
I1101 17:11:07.562705  478030 logs.go:158] "Data saved to local file: out.json" source="agent"
I1101 17:11:07.562783  478030 run.go:450] "Shutting down" logger="APIServer.ListenAndServe" addr=":8081"
I1101 17:11:07.562829  478030 run.go:441] "Stopped" logger="APIServer.ListenAndServe" addr=":8081" reason="http: Server closed"
I1101 17:11:07.562880  478030 run.go:454] "Shutdown complete" logger="APIServer.ListenAndServe" addr=":8081" err=null

Example of an error when the port is already occupied.

$ python3 -m http.server 8081 &

$ POD_NAMESPACE=venafi go run ./ agent --one-shot --output-path out.json --api-token should-not-be-required  --agent-config-file examples/cert-manager-agent.yaml  --enable-metrics --enable-pprof --log-level 1
I1101 17:12:25.669160  478238 logs.go:158] "Preflight agent version: development ()" source="agent"
I1101 17:12:25.669348  478238 logs.go:158] "Using the Jetstack Secure API Token auth mode since --api-token was specified." source="agent"
I1101 17:12:25.669371  478238 logs.go:158] "Using deprecated Endpoint configuration. User Server instead." source="agent"
I1101 17:12:25.669402  478238 run.go:98] "Profiling endpoints enabled" logger="APIServer" addr=":8081" path="/debug/pprof"
I1101 17:12:25.669428  478238 run.go:107] "Metrics endpoints enabled" logger="APIServer" addr=":8081" path="/metrics"
I1101 17:12:25.669498  478238 run.go:116] "Healthz endpoints enabled" logger="APIServer" addr=":8081" path="/healthz"
I1101 17:12:25.669518  478238 run.go:120] "Readyz endpoints enabled" logger="APIServer" addr=":8081" path="/readyz"
I1101 17:12:25.669613  478238 run.go:439] "Starting" logger="APIServer.ListenAndServe" addr=":8081"
I1101 17:12:25.669804  478238 run.go:441] "Stopped" logger="APIServer.ListenAndServe" addr=":8081" reason="listen tcp :8081: bind: address already in use"
I1101 17:12:25.669876  478238 run.go:457] "Shutdown skipped" logger="APIServer.ListenAndServe" addr=":8081" reason="Server already stopped"
E1101 17:12:25.670425  478238 logs.go:156] "error messages will not show in the pod's events because the POD_NAME environment variable is empty" source="agent"
I1101 17:12:25.671125  478238 logs.go:158] "starting \"k8s/secrets.v1\" datagatherer" source="agent"
E1101 17:12:25.671155  478238 logs.go:156] "failed to complete initial sync of \"k8s-dynamic\" data gatherer \"k8s/secrets.v1\": timed out waiting for Kubernetes caches to sync" source="agent"
I1101 17:12:25.671177  478238 logs.go:158] "successfully gathered 0 items from \"k8s/secrets.v1\" datagatherer" source="agent"
I1101 17:12:25.671367  478238 envvar.go:172] "Feature gate default state" feature="InformerResourceVersion" enabled=false
I1101 17:12:25.671407  478238 envvar.go:172] "Feature gate default state" feature="WatchListClient" enabled=false
I1101 17:12:25.671541  478238 logs.go:158] "Data saved to local file: out.json" source="agent"
E1101 17:12:25.671572  478238 logs.go:156] "failed to wait for controller-runtime component to stop: listen tcp :8081: bind: address already in use" source="agent"
exit status 1

@maelvls
Copy link
Member

maelvls commented Nov 4, 2024

How did you catch this bug? Was it by running the test.sh script?

If not, we should add some form of test case somewhere. I'd suggest adding a comment to https://venafi.atlassian.net/browse/VC-35565 to mention that --one-shot needs to be smoke-tested too

server.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
})
log.Info("Readyz endpoints enabled", "path", "/readyz")
Copy link
Member

@maelvls maelvls Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, unimportant: weird that we start a readiness and liveness server when running --one-shot, but that's all right IMO

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's weird, perhaps in a future PR we can make it so that the API server is not started in one-shot mode.

Copy link
Member

@maelvls maelvls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

It'd be great to have some fast test that uses a fake Venafi Cloud API to test that --one-shot immediately returns. It could look like the test cases in

func Test_ValidateAndCombineConfig_VenafiCloudKeyPair(t *testing.T) {
but with the Run func

@wallrj wallrj force-pushed the VC-35738/stop-the-server branch 3 times, most recently from 29a5520 to c214e58 Compare November 5, 2024 11:49
@wallrj
Copy link
Member Author

wallrj commented Nov 5, 2024

@maelvls I added a test.

Before:

$ go test ./pkg/agent/... --run TestRunOneShot -v
=== RUN   TestRunOneShot
    run_test.go:50: Running child process
    run_test.go:68: STDOUT
        I1105 11:47:58.084463   81129 logs.go:158] "Preflight agent version: development ()" source="agent"
        I1105 11:47:58.084574   81129 logs.go:158] "Using the Jetstack Secure API Token auth mode since --api-token was specified." source="agent"
        I1105 11:47:58.085278   81129 logs.go:158] "Reading data from local file: testdata/success/input.json" source="agent"
        I1105 11:47:58.085357   81129 logs.go:158] "Data saved to local file: /dev/null" source="agent"

    run_test.go:69: STDERR

    run_test.go:70:
                Error Trace:    /home/richard/projects/venafi/venafi-kubernetes-agent/pkg/agent/run_test.go:70
                Error:          Received unexpected error:
                                signal: killed
                Test:           TestRunOneShot
                Messages:       context deadline exceeded
--- FAIL: TestRunOneShot (3.01s)
FAIL
FAIL    github.com/jetstack/preflight/pkg/agent 3.033s
FAIL

After:

$ go test ./pkg/agent/... --run TestRunOneShot -v  -count 1
=== RUN   TestRunOneShot
    run_test.go:50: Running child process
    run_test.go:68: STDOUT
        I1105 11:48:56.732058   82228 logs.go:158] "Preflight agent version: development ()" source="agent"
        I1105 11:48:56.732211   82228 logs.go:158] "Using the Jetstack Secure API Token auth mode since --api-token was specified." source="agent"
        I1105 11:48:56.732225   82228 run.go:117] "Healthz endpoints enabled" logger="APIServer" addr=":8081" path="/healthz"
        I1105 11:48:56.732235   82228 run.go:121] "Readyz endpoints enabled" logger="APIServer" addr=":8081" path="/readyz"
        I1105 11:48:56.732305   82228 run.go:447] "Starting" logger="APIServer.ListenAndServe" addr=":8081"
        I1105 11:48:56.733181   82228 logs.go:158] "Reading data from local file: testdata/success/input.json" source="agent"
        I1105 11:48:56.733250   82228 logs.go:158] "Data saved to local file: /dev/null" source="agent"
        I1105 11:48:56.733258   82228 run.go:461] "Shutting down" logger="APIServer.ListenAndServe" addr=":8081"
        I1105 11:48:56.733332   82228 run.go:476] "Shutdown complete" logger="APIServer.ListenAndServe" addr=":8081"
        PASS

    run_test.go:69: STDERR

--- PASS: TestRunOneShot (0.03s)
PASS
ok      github.com/jetstack/preflight/pkg/agent 0.052s

@wallrj wallrj requested a review from maelvls November 5, 2024 11:50
Signed-off-by: Richard Wall <richard.wall@venafi.com>
Signed-off-by: Richard Wall <richard.wall@venafi.com>
Copy link
Member

@maelvls maelvls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done with the test. Thanks. LGTM

@wallrj wallrj merged commit 41553d4 into VC-35738/feature Nov 5, 2024
2 checks passed
@wallrj wallrj deleted the VC-35738/stop-the-server branch November 5, 2024 13:23
@wallrj wallrj mentioned this pull request Nov 5, 2024
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants