-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: use Scarf Gateway for Superset helm charts/Docker compose downloads #24432
feat: use Scarf Gateway for Superset helm charts/Docker compose downloads #24432
Conversation
Hey @arjundevarajan it seems you need to fix lint errors, let me know if you need help |
I think it was just the helm chart version. I bumped it 🤞 |
OK, CI is clean. I added some documentation on this PR, so if any reviewer(s) could give that a quick 👀 it would be appreciated. We want to make sure we're transparent enough about the telemetry being added, and how to opt out. |
::: | ||
|
||
:::note | ||
Superset uses [Scarf Gateway](https://about.scarf.sh/scarf-gateway) to collect telmetry data to better understand and support the need for patch versions of Sueprset. Scarf purges PII and provides aggregated statistics. Superset users can easily opt out of analytics in various ways documented [here](https://docs.scarf.sh/gateway/#do-not-track). However, if you wish to opt-out of this in your Docker-based installation, you can simply edit your `docker-compose.yml` or `docker-compose-non-dev.yml` file and remove `apachesuperset.docker.scarf.sh/` from the `x-superset-image` setting, so that it's simply pulling `apache/superset:${TAG:-latest-dev}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arjundevarajan and @rusackas would it be safer to make it opt-out by default? Granted one might not get the same scale of telemetry data, but it feels significantly less intrusive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @john-bodley mostly chiming in to agree with @rusackas that opting for the non-default option will significantly decrease the amount of useful data that's being collected. It should be reemphasized that all of this data is de-identified no matter what, and that the ASF has approved Scarf as a verified external service provider in the past for other ASF projects (see Privacy Policy here), which have deployed Scarf live to their projects for several years now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @arjundevarajan for the context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies I first approved this and then added this comment and given I can't retract my approval AFAIK I opted for "Request changes".
@john-bodley Given that the approach had lazy consensus on the dev@ list, and the fact that there's documentation (and links to further details) in various places, I think it's safe to fo with opt-out. We can take this to the dev@ list again if it warrants further discussion, but I'm optimistic that when this makes it into a release, we'll have further changes to raise awareness about its existence and how to opt-out, on the wiki, in release notes, in the change log, etc. The main reason I'd advocate for this approach is that if it's opt-in, I suspect that we'll garner very little telemetry at all. I think this sort of telemetry is the norm in the industry at this point, and Scarf is used in other Apache projects as well. Let me know if you think this makes sense, or if this warrants widening the net on the discussion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unblocking per #24432 (comment).
SUMMARY
This PR updates the Superset configuration for helm charts and Docker compose to fetch Superset containers via a Scarf endpoint, so that Superset maintainers can collect basic de-identified download and adoption metrics. It does not affect where the containers are being hosted, as Scarf is only redirecting traffic back to Docker Hub.
This change was suggested by Superset maintainers in direct discussions.
Scarf purges PII and provides aggregated statistics. Superset users can easily opt out of analytics in various ways documented here.
TESTING INSTRUCTIONS
To test this, download Apache Superset using the new endpoint (e.g. docker pull apachesuperset.docker.scarf.sh/apache/superset) and verify that the apache/superset container downloads without issue.
ADDITIONAL INFORMATION