Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gunicorn): statsd support #32487

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

chollinger93
Copy link

SUMMARY

What

Adds native statsd support for gunicorn and makes configuring the StatsdStatsLogger easier.

Also some docs updates.

Why

I originally put this together @ work for us to monitor not only superset itself via Datadog/statsd, but also gunicorn itself, which has native support for this. I figured this might be useful for a general release.

We use Superset on K8s + Datadog @ ngrok (mentioning this for transparency's sake), so I wanted to supplement the native K8s metric we get with more specific metrics about both Superset and Celery to fine-tune the config. This also works with veneur, fwiw.

How

Simply updates the startup script, which means this should work in both K8s and Docker.

As a side effect, Superset statsd can now be configured via

STATS_LOGGER = StatsdStatsLogger(
    host=os.environ["SERVER_STATSD_HOST"],
    port=os.environ["SERVER_STATSD_PORT"],
    prefix=os.environ["SERVER_STATSD_PREFIX"],
)

In superset_config.py.

TESTING INSTRUCTIONS

I tested this by setting these values and deploying our own Superset instance, which we run on K8s. If there are more permutations of this to test, please lmk.

Deploying this yields metrics like superset.gunicorn.workers. See https://docs.gunicorn.org/en/stable/instrumentation.html

image

If users use the default / do not set this, this should be safe, since defaulting to localhost:8125 is (afaik) the standard for statsd collectors (+ it's udp anyways). I tested this by deploying without these values set. If we want to be more defensive here, lmk.

You can also test this via (without configuring anything)

docker compose -f docker-compose-non-dev.yml up --build
docker exec -it superset_app /bin/bash
apt update
apt install netcat-openbsd -y
nc -lu 8125

Load the UI and you should see

superset.gunicorn.requests:1|c|@1.0superset.gunicorn.request.status.200:1|c|@1.0superset.gunicorn.request.duration:694.878|mssuperset.gunicorn.requests:1|c|@1.0superset.gunicorn.request.status.200:1|c|@1.0superset.gunicorn.request.duration:4.223|mssuperset.gunicorn.requests: ...

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@github-actions github-actions bot added the doc Namespace | Anything related to documentation label Mar 3, 2025
@dosubot dosubot bot added the infra:webserver Infra setup and configuration related to webserver label Mar 3, 2025
Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.
Category Issue Fix Detected
Security Exposed StatsD Metrics Endpoint ▹ view
Logging Incorrect default StatsD host for containerized environments ▹ view
Files scanned
File Path Reviewed
docker/entrypoints/run-server.sh

Explore our documentation to understand the languages and file types we support and the files we ignore.

Need a new review? Comment /korbit-review on this PR and I'll review your latest changes.

Korbit Guide: Usage and Customization

Interacting with Korbit

  • You can manually ask Korbit to review your PR using the /korbit-review command in a comment at the root of your PR.
  • You can ask Korbit to generate a new PR description using the /korbit-generate-pr-description command in any comment on your PR.
  • Too many Korbit comments? I can resolve all my comment threads if you use the /korbit-resolve command in any comment on your PR.
  • Chat with Korbit on issues we post by tagging @korbit-ai in your reply.
  • Help train Korbit to improve your reviews by giving a 👍 or 👎 on the comments Korbit posts.

Customizing Korbit

  • Check out our docs on how you can make Korbit work best for you and your team.
  • Customize Korbit for your organization through the Korbit Console.

Current Korbit Configuration

General Settings
Setting Value
Review Schedule Automatic excluding drafts
Max Issue Count 10
Automatic PR Descriptions
Issue Categories
Category Enabled
Documentation
Logging
Error Handling
Readability
Design
Performance
Security
Functionality

Feedback and Support

Note

Korbit Pro is free for open source projects 🎉

Looking to add Korbit to your team? Get started with a free 2 week trial here

Comment on lines +21 to +22
STATSD_HOST="${SERVER_STATSD_HOST:-localhost}"
STATSD_PORT="${SERVER_STATSD_PORT:-8125}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposed StatsD Metrics Endpoint category Security

Tell me more
What is the issue?

Default StatsD configuration exposes metrics endpoint on all network interfaces without authentication

Why this matters

If StatsD is accessible on all interfaces, attackers could potentially access sensitive metrics data or cause denial of service by flooding the StatsD port. StatsD typically doesn't have built-in authentication.

Suggested change ∙ Feature Preview

Change the default StatsD host to bind to localhost only:

STATSD_HOST="${SERVER_STATSD_HOST:-127.0.0.1}"

Additionally, consider adding network-level access controls or using a secure metrics collection system.

Report a problem with this comment

💬 Chat with Korbit by mentioning @korbit-ai.

@@ -18,6 +18,9 @@
# under the License.
#
HYPHEN_SYMBOL='-'
STATSD_HOST="${SERVER_STATSD_HOST:-localhost}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect default StatsD host for containerized environments category Logging

Tell me more
What is the issue?

Using localhost as the default StatsD host may cause metrics to be lost when running in a containerized environment, as localhost would refer to the container itself rather than the actual StatsD server.

Why this matters

In Docker environments, localhost won't reach the StatsD server unless it's running in the same container, which is unlikely. This will result in silent metric collection failures.

Suggested change ∙ Feature Preview

Change the default host to an environment-appropriate value or require explicit configuration:

STATSD_HOST="${SERVER_STATSD_HOST:?'StatsD host must be configured'}"

Alternatively, if a default is needed:

STATSD_HOST="${SERVER_STATSD_HOST:-statsd}"

Report a problem with this comment

💬 Chat with Korbit by mentioning @korbit-ai.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats on making your first PR and thank you for contributing to Superset! 🎉 ❤️

We hope to see you in our Slack community too! Not signed up? Use our Slack App to self-register.

@rusackas rusackas added the review:checkpoint Last PR reviewed during the daily review standup label Mar 3, 2025
@mistercrunch
Copy link
Member

Thanks for the PR. Here wanted to surface the fact that there are a lot of superset_config.py stuff we could promote to the helm chart - almost an infinity of them. Wondering if it's best to:

  • PROMOTE EVERYTHING, bring as much as possible towards the helm chart so it's easy to manage all configs in the chart itself
  • promote only the things people are likely to want to tweak in the helm chart
  • clarify how to setup a proper superset_config.py, and limit the number of indirections in the helm chart

Personally I feel like #3 is the best option, especially given the breadth of stuff in there, and the fact that many configs require a python object or data structure.

Regardless of all this, one thing I thought might be useful would be to improve the superset_config.py framework (that builds upon Flask's config object) to support more env vars out of the box. Like all common types (string, int, float) would be exposed as say SUPERSET__{variable_name_in_config.py} (prefix here to avoid potential collisions). That works for everything that doesn't require python objects or data structures, meaning all simple configs are effectively configurable via env vars with a clear 1-to-1 mapping.

@chollinger93
Copy link
Author

@mistercrunch I agree, but I feel like that goes past the scope of this PR, since this hooks into the Docker entrypoint and technically doesn't touch the helm values or config py (I just made sure to document them in the values.yml so folks are aware of it).

So even with improvements to the superset_config.py, this still requires passing those 2 arguments to gunicorn directly.

I realized that I had a whole "caveat" section written out that I forgot to paste in the original PR description that is somewhat related:

I think it would be nice to have a more generic abstraction for different metric collection approaches, but statsd seems to be the only one in this codebase + its direct dependencies.

Celery doesn't seem to have statsd support natively, but if I do find something, I'm happy to add this in a separate PR.

I didn't include SERVER_STATSD_DOGSTATSD_TAGS via dogstatsd-tags. Not sure how this group feels about vendor-specific implementations.

So if there was a more generic metrics abstraction, maybe that wouldn't be true.

Regarding your comments:

PROMOTE EVERYTHING, bring as much as possible towards the helm chart so it's easy to manage all configs in the chart itself

My personal take is that I dislike application configs in helm charts. For me, a helm chart should describe infrastructure, not application behavior.

Grafana does this middle ground (as do other tools) where they define a bunch of GF_* variables that overwrite what would otherwise be in an ini (iirc) file, which avoids having to mount a config file or create a K8s config map, but I always found that very confusing.

clarify how to setup a proper superset_config.py, and limit the number of indirections in the helm chart

+1

I'd be happy to contribute here - my main issue is that I'm not even sure if our own superset_config.py is following best practices or not. I can tell you that it's 285 lines. :) I'm sure you have a best practice file at Preset?

Regardless of all this, one thing I thought might be useful would be to improve the superset_config.py framework (that builds upon Flask's config object) to support more env vars out of the box.

100% agree. Also, more documentation without changing behavior would certainly help - for instance, worker_max_memory_per_child is a setting you can set for Celery (or so I think), but neither the Superset Slack, nor the docs make any mention of it. On a related note, it's hard to figure out a pattern for the environment variables (e.g., the SERVER_ prefix seems to be for Gunicorn/the frontend, but that's never explicitly stated anywhere).

@rusackas rusackas removed the review:checkpoint Last PR reviewed during the daily review standup label Mar 4, 2025
@mistercrunch
Copy link
Member

I'm sure you have a best practice file at Preset?

As you can imagine, we have an intricate, heavily customized superset_config, with custom python objects and using a lot of the config flag that are exposed (many of those we created because we needed customizations/configurability), with some conditional/dynamic configs in many places (depending on service tier, cloud-provider, regions) and deep integrations.

Itemizing a few of the more complex ones here:

  • observability: deep integration with Datadog and its many services (statsd, stdout, APM, rdbms monitoring, ...)
  • intricate caching (different backends on different layers), solving for noisy neighbors
  • custom "analytics events" framework (through Segment and ultimately BigQuery) with filtering/obfuscation/enrichment
  • deep integration with split.io, allowing us to have dynamic configurations per-"instance" / customers / tiers live (changing feature flags/configs without rebuilding/restarting the app)
  • some multi-tenancy-related configurations + related secret management solutions

This is mostly from memory, but shows how intricate things can get in a mature/heterogenous environment.

Some of the things I'm personally realizing trying to cater to a variety of environments across the community with a different set of constraints:

  • there's no such thing as one-size-fits-all for most things, deep-configurability is generally needed
  • it's hard to support all of this, the matrix of possible combinations of options is nearly-infinite, making it hard to guarantee decent results when going off the beaten path
  • would be good to be more opinionated and reduce the configurability surface, though there's a clear reason on why it's there in the first place
  • on the infra layer, people have hard constraints and a wide array of tools/frameworks they have strong preference or hard-constraints towards, whether it's Terraform/CloudFormation, Postgres/MySQL, Datadog/Prometheus, Redis/RabbitMQ, specific database driver versions, ...
  • Even where strong standards / consolidation exists (k8s / docker), how people implement solutions on those layers may contain lots of constraints (force their own base image for docker, helm vs k8soperators, dependency-scanning tools like Sneak that prevents productionizing certain lib/version combinations)

Anyhow, super hard to try and cater to everyone. Feels like supporting one or two "reference implementations" to use as a template is best. Trying to serve highly-configurable templates (say a deeply configurable/parameterized helm chart) may be a disservice in some cases where configurability kills simplicity, and abstractions create this complex "indirection" layer.

Emptying my bag here, guessing my goal is to expose some of the challenges and ask the community for help figuring this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Namespace | Anything related to documentation infra:webserver Infra setup and configuration related to webserver size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants