Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VC-35738] Move the pprof endpoints to the main HTTP server rather than starting a second server #600

Merged
merged 1 commit into from
Oct 31, 2024

Conversation

wallrj
Copy link
Member

@wallrj wallrj commented Oct 30, 2024

The pprof server was being started on a second HTTP server running in an unmanaged Go routine.
It used log.Fatal on exit to bring down the process in the event of an error in that server.
Moving the pprof endpoints to the main HTTP server simplifies the code because it avoids having to manage a second server.
It will make it easier to enable TLS on these endpoints in future, if that is desired, because only a single TLS certificate will be required.
It makes life easier for platform admins who will have one less TCP port to worry about.
It removes one more instance of log.Fatal, which are scattered all over the run command and moves us closer to removing the use of the legacy log module and handling all sub-command errors in a consistent way.

These changes have been extracted from a larger prototype PR: #593 for the sake of easy review.

The --enable-pprof feature was originally added by @tfadeyi in #286 to aid the investigation of the high memory usage of the agent.

xref:

image
$ POD_NAMESPACE=venafi ./preflight agent --output-path /dev/null --api-token should-not-be-required  --agent-config-file examples/cert-manager-agent.yaml --enable-pprof --period 1h
2024/10/30 10:02:43 Preflight agent version: development ()
2024/10/30 10:02:43 Using the Jetstack Secure API Token auth mode since --api-token was specified.
2024/10/30 10:02:43 Using deprecated Endpoint configuration. User Server instead.
2024/10/30 10:02:43 pprof profiling was enabled.
...
$ ./preflight agent --help | grep -i pprof
      --enable-pprof                                      Enables the pprof profiling endpoints on the agent server (port: 8081).

… a second server

Signed-off-by: Richard Wall <richard.wall@venafi.com>
@wallrj wallrj requested a review from tfadeyi October 30, 2024 10:23
Copy link
Contributor

@tfadeyi tfadeyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Question: do you think we can hide the flag? It was mostly meant for development not really for customers

@wallrj
Copy link
Member Author

wallrj commented Oct 30, 2024

Question: do you think we can hide the flag? It was mostly meant for development not really for customers

I prefer to show the the flag because think end-users and CREs will find it useful in the field, and it will be useful for them to be able to read about how it works it in --help.
Some examples of other projects that make the profiling features available and apparent to end-users:

The security best-practice is to not allow unauthenticated access to the pprof endpoints because they reveal too much information about the process which is useful to hackers:

For example, the pprof endpoints are enabled by default on Kubernetes API server, but they are protected by RBAC since CVE-2019-11248:

$ kubectl get --raw /debug/pprof --as system:anonymous
Error from server (Forbidden): forbidden: User "system:anonymous" cannot get path "/debug/pprof"
$ kubectl get --raw /debug/pprof/goroutine?debug=1 | head
goroutine profile: total 1821
177 @ 0x440d4e 0x452b25 0x1bb7175 0x478a21
#       0x1bb7174       google.golang.org/grpc/internal/grpcsync.(*CallbackSerializer).run+0x114        google.golang.org/grpc@v1.65.0/internal/grpcsync/callback_serializer.go:76

119 @ 0x440d4e 0x452b25 0x223fc2f 0x223f957 0x478a21
#       0x223fc2e       k8s.io/apiserver/pkg/storage/cacher.(*cacheWatcher).process+0xae                k8s.io/apiserver/pkg/storage/cacher/cache_watcher.go:519
#       0x223f956       k8s.io/apiserver/pkg/storage/cacher.(*cacheWatcher).processInterval+0x3f6       k8s.io/apiserver/pkg/storage/cacher/cache_watcher.go:506

115 @ 0x440d4e 0x452b25 0x1eecca5 0x87de09 0x1ed2956 0x1cfd92f 0x1ed289c 0x1ed1bd1 0x1f0484a 0x1f0466e 0x152bdd0 0x21ae60c 0x21ae5de 0x2273dfd 0x1e1b810 0x1e1b3a6 0x21ae791 0x21a44d7 0x87de09 0x1d1f086 0x87de09 0x1d17952 0x87de09 0x21a42b7 0x87de09 0x1e66f43 0x87de09 0x1d17952 0x87de09 0x21a4097 0x87de09 0x21a25d0
#       0x1eecca4       k8s.io/apiserver/pkg/endpoints/handlers.(*WatchServer).HandleHTTP+0x9e4                                 k8s.io/apiserver/pkg/endpoints/handlers/watch.go:227

In a future PR we should implement authentication and authorization using something like kube-rbac-proxy, or it's controller-runtime replacement, as documented here:

And maybe create a default cluster-debugger ClusterRole which allows access to the /debug/pprof/* non-resource endpoints.

And keep an option to turn off the pprof endpoints entirely, for the reasons given here:

@wallrj wallrj changed the base branch from master to VC-35738/feature October 31, 2024 09:29
@wallrj wallrj merged commit efebe5b into VC-35738/feature Oct 31, 2024
2 checks passed
@wallrj wallrj deleted the VC-35738/pprof-on-8081 branch October 31, 2024 09:31
@wallrj wallrj mentioned this pull request Oct 31, 2024
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants