-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(gunicorn): statsd support #32487
base: master
Are you sure you want to change the base?
feat(gunicorn): statsd support #32487
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by Korbit AI
Korbit automatically attempts to detect when you fix issues in new commits.
Category | Issue | Fix Detected |
---|---|---|
Exposed StatsD Metrics Endpoint ▹ view | ||
Incorrect default StatsD host for containerized environments ▹ view |
Files scanned
File Path | Reviewed |
---|---|
docker/entrypoints/run-server.sh | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Need a new review? Comment
/korbit-review
on this PR and I'll review your latest changes.Korbit Guide: Usage and Customization
Interacting with Korbit
- You can manually ask Korbit to review your PR using the
/korbit-review
command in a comment at the root of your PR.- You can ask Korbit to generate a new PR description using the
/korbit-generate-pr-description
command in any comment on your PR.- Too many Korbit comments? I can resolve all my comment threads if you use the
/korbit-resolve
command in any comment on your PR.- Chat with Korbit on issues we post by tagging @korbit-ai in your reply.
- Help train Korbit to improve your reviews by giving a 👍 or 👎 on the comments Korbit posts.
Customizing Korbit
- Check out our docs on how you can make Korbit work best for you and your team.
- Customize Korbit for your organization through the Korbit Console.
Current Korbit Configuration
General Settings
Setting Value Review Schedule Automatic excluding drafts Max Issue Count 10 Automatic PR Descriptions ❌ Issue Categories
Category Enabled Documentation ✅ Logging ✅ Error Handling ✅ Readability ✅ Design ✅ Performance ✅ Security ✅ Functionality ✅ Feedback and Support
Note
Korbit Pro is free for open source projects 🎉
Looking to add Korbit to your team? Get started with a free 2 week trial here
STATSD_HOST="${SERVER_STATSD_HOST:-localhost}" | ||
STATSD_PORT="${SERVER_STATSD_PORT:-8125}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exposed StatsD Metrics Endpoint 
Tell me more
What is the issue?
Default StatsD configuration exposes metrics endpoint on all network interfaces without authentication
Why this matters
If StatsD is accessible on all interfaces, attackers could potentially access sensitive metrics data or cause denial of service by flooding the StatsD port. StatsD typically doesn't have built-in authentication.
Suggested change ∙ Feature Preview
Change the default StatsD host to bind to localhost only:
STATSD_HOST="${SERVER_STATSD_HOST:-127.0.0.1}"
Additionally, consider adding network-level access controls or using a secure metrics collection system.
💬 Chat with Korbit by mentioning @korbit-ai.
@@ -18,6 +18,9 @@ | |||
# under the License. | |||
# | |||
HYPHEN_SYMBOL='-' | |||
STATSD_HOST="${SERVER_STATSD_HOST:-localhost}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect default StatsD host for containerized environments 
Tell me more
What is the issue?
Using localhost as the default StatsD host may cause metrics to be lost when running in a containerized environment, as localhost would refer to the container itself rather than the actual StatsD server.
Why this matters
In Docker environments, localhost won't reach the StatsD server unless it's running in the same container, which is unlikely. This will result in silent metric collection failures.
Suggested change ∙ Feature Preview
Change the default host to an environment-appropriate value or require explicit configuration:
STATSD_HOST="${SERVER_STATSD_HOST:?'StatsD host must be configured'}"
Alternatively, if a default is needed:
STATSD_HOST="${SERVER_STATSD_HOST:-statsd}"
💬 Chat with Korbit by mentioning @korbit-ai.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Here wanted to surface the fact that there are a lot of
Personally I feel like #3 is the best option, especially given the breadth of stuff in there, and the fact that many configs require a python object or data structure. Regardless of all this, one thing I thought might be useful would be to improve the |
@mistercrunch I agree, but I feel like that goes past the scope of this PR, since this hooks into the Docker entrypoint and technically doesn't touch the helm values or config py (I just made sure to document them in the So even with improvements to the I realized that I had a whole "caveat" section written out that I forgot to paste in the original PR description that is somewhat related:
So if there was a more generic metrics abstraction, maybe that wouldn't be true. Regarding your comments:
My personal take is that I dislike application configs in helm charts. For me, a helm chart should describe infrastructure, not application behavior. Grafana does this middle ground (as do other tools) where they define a bunch of
+1 I'd be happy to contribute here - my main issue is that I'm not even sure if our own
100% agree. Also, more documentation without changing behavior would certainly help - for instance, |
As you can imagine, we have an intricate, heavily customized superset_config, with custom python objects and using a lot of the config flag that are exposed (many of those we created because we needed customizations/configurability), with some conditional/dynamic configs in many places (depending on service tier, cloud-provider, regions) and deep integrations. Itemizing a few of the more complex ones here:
This is mostly from memory, but shows how intricate things can get in a mature/heterogenous environment. Some of the things I'm personally realizing trying to cater to a variety of environments across the community with a different set of constraints:
Anyhow, super hard to try and cater to everyone. Feels like supporting one or two "reference implementations" to use as a template is best. Trying to serve highly-configurable templates (say a deeply configurable/parameterized helm chart) may be a disservice in some cases where configurability kills simplicity, and abstractions create this complex "indirection" layer. Emptying my bag here, guessing my goal is to expose some of the challenges and ask the community for help figuring this out. |
SUMMARY
What
Adds native
statsd
support for gunicorn and makes configuring theStatsdStatsLogger
easier.Also some docs updates.
Why
I originally put this together @ work for us to monitor not only superset itself via Datadog/statsd, but also gunicorn itself, which has native support for this. I figured this might be useful for a general release.
We use Superset on K8s + Datadog @ ngrok (mentioning this for transparency's sake), so I wanted to supplement the native K8s metric we get with more specific metrics about both Superset and Celery to fine-tune the config. This also works with veneur, fwiw.
How
Simply updates the startup script, which means this should work in both K8s and Docker.
As a side effect, Superset statsd can now be configured via
In
superset_config.py
.TESTING INSTRUCTIONS
I tested this by setting these values and deploying our own Superset instance, which we run on K8s. If there are more permutations of this to test, please lmk.
Deploying this yields metrics like
superset.gunicorn.workers
. See https://docs.gunicorn.org/en/stable/instrumentation.htmlIf users use the default / do not set this, this should be safe, since defaulting to localhost:8125 is (afaik) the standard for statsd collectors (+ it's udp anyways). I tested this by deploying without these values set. If we want to be more defensive here, lmk.
You can also test this via (without configuring anything)
docker compose -f docker-compose-non-dev.yml up --build docker exec -it superset_app /bin/bash apt update apt install netcat-openbsd -y nc -lu 8125
Load the UI and you should see
ADDITIONAL INFORMATION