-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Add Scarf based telemetry #39510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Add Scarf based telemetry #39510
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
kaxil
commented
May 9, 2024
0479db7 to
b16caed
Compare
jscheffl
reviewed
May 9, 2024
kaxil
added a commit
to astronomer/airflow
that referenced
this pull request
May 9, 2024
Similar to apache/superset#25065 and apache#39510 but this one doesn't need to wait for the VOTE since this embed it to Readme. More detail in https://docs.scarf.sh/web-traffic/ All of this data will be available to interested PMC members, and reported periodically in things like Town Hall & newsletters.
kaxil
added a commit
that referenced
this pull request
May 9, 2024
Similar to apache/superset#25065 and #39510 but this one doesn't need to wait for the VOTE since this embed it to Readme. More detail in https://docs.scarf.sh/web-traffic/ All of this data will be available to interested PMC members, and reported periodically in things like Town Hall & newsletters.
200f210 to
7d8eb2d
Compare
kaxil
added a commit
to apache/airflow-site
that referenced
this pull request
May 10, 2024
Similar to apache/superset#25065 and apache/airflow#39510 but this one doesn't need to wait for the VOTE since this adds a simple transparent tracking pixel. More detail in https://docs.scarf.sh/web-traffic/ All of this data will be available in Scarf, creds for which are shared to 1password for PMC members, and can be reported periodically in things like Town Hall or newsletters.
kaxil
commented
May 10, 2024
kaxil
added a commit
to apache/airflow-site
that referenced
this pull request
May 10, 2024
Similar to apache/superset#25065 and apache/airflow#39510 but this one doesn't need to wait for the VOTE since this adds a simple transparent tracking pixel. More detail in https://docs.scarf.sh/web-traffic/ All of this data will be available in Scarf, creds for which are shared to 1password for PMC members, and can be reported periodically in things like Town Hall or newsletters.
jscheffl
reviewed
May 10, 2024
dstandish
reviewed
May 16, 2024
dstandish
reviewed
May 16, 2024
dstandish
reviewed
May 16, 2024
jscheffl
approved these changes
May 16, 2024
romsharon98
pushed a commit
to romsharon98/airflow
that referenced
this pull request
Jul 26, 2024
Similar to apache/superset#25065 and apache#39510 but this one doesn't need to wait for the VOTE since this embed it to Readme. More detail in https://docs.scarf.sh/web-traffic/ All of this data will be available to interested PMC members, and reported periodically in things like Town Hall & newsletters.
romsharon98
pushed a commit
to romsharon98/airflow
that referenced
this pull request
Jul 26, 2024
To take [the discussion about adding Telemetry](https://lists.apache.org/thread/7f6qyr8w2n8w34g63s7ybhzphgt8h43m) forward, I am creating a draft PR that adds some basic telemetry to send to Scarf. Voting thread: https://lists.apache.org/thread/h1x2glvnd42rbj2q2rgpfo3pjhmpt307 I have added docs on the data collection as well as a way to opt-out of it. Telemetry added at: - Scheduler startup [Custom telemetry](https://docs.scarf.sh/custom-telemetry/) similar to other popular projects like [Unstructured](Unstructured-IO/unstructured@f0a63e2) - Webserver via a [tracking pixel](https://docs.scarf.sh/web-traffic/#creating-a-pixel), similar to [Apache Superset](https://github.com/apache/superset/pull/26011/files) Data collected: - Airflow version - Python version - Platform System info: Linux/Darwin - Machine type: arm64/aarch64 - Airflow Metadata DB: Postgres/MySQL - DB: sqlite/postgres - DB version: 12.6.3 - Number of DAGs
This was referenced Aug 1, 2024
This was referenced Aug 5, 2024
Closed
tatiana
added a commit
to astronomer/dag-factory
that referenced
this pull request
Oct 17, 2024
Export telemetry related to DAG Factory usage to [Scarf](https://about.scarf.sh/). This data assists the project maintainers in better understanding how DAG Factory is used. Insights from this telemetry are critical for prioritizing patches, minor releases, and security fixes. Additionally, this information supports critical decisions related to the development road map. Deployments and individual users can opt out of analytics by setting the configuration: ``` [dag_factory] enable_telemetry False ``` As described in the [official documentation](https://docs.scarf.sh/gateway/#do-not-track), it is also possible to opt-out by setting one of the following environment variables: ```commandline AIRFLOW__DAG_FACTORY__ENABLE_TELEMETRY=False DO_NOT_TRACK=True SCARF_NO_ANALYTICS=True ``` In addition to Scarf's default data collection, DAG Factory collects the following information: - DAG Factory version - Airflow version - Python version - Operating system & machine architecture - Event type - Number of DAGs - Number of TaskGroups - Number of Tasks No user-identifiable information (IP included) is stored in Scarf, even though Scarf infers information from the IP, such as location, and stores that. The data collection is GDPR compliant. The data is not currently being emitted for pre-releases except from integration tests. The Apache Foundation supports this same strategy in many of its OpenSource projects, including Airflow ([#39510](apache/airflow#39510)). Example of visualisation of the data via the Scarf UI: <img width="1624" alt="Screenshot 2024-10-17 at 01 56 09" src="https://github.com/user-attachments/assets/d4191834-1e02-4192-811b-125d3fa735fe"> <img width="1624" alt="Screenshot 2024-10-17 at 01 55 59" src="https://github.com/user-attachments/assets/cd814e11-7f77-45c8-95a0-56e29d9f9f12"> <img width="1624" alt="Screenshot 2024-10-17 at 01 55 47" src="https://github.com/user-attachments/assets/2950ddbb-ea25-415f-b61c-3fbdcf4fc739"> <img width="1624" alt="Screenshot 2024-10-17 at 01 55 42" src="https://github.com/user-attachments/assets/a56ecefd-0cd0-486c-9faf-026b1e9a4ceb"> Closes: #214
kosteev
pushed a commit
to GoogleCloudPlatform/composer-airflow
that referenced
this pull request
Nov 9, 2024
Similar to apache/superset#25065 and apache/airflow#39510 but this one doesn't need to wait for the VOTE since this embed it to Readme. More detail in https://docs.scarf.sh/web-traffic/ All of this data will be available to interested PMC members, and reported periodically in things like Town Hall & newsletters. GitOrigin-RevId: 08a8028faefff2ed1002291b4bf6e522e8e6ed0f
tatiana
added a commit
to astronomer/astronomer-cosmos
that referenced
this pull request
Dec 20, 2024
Export telemetry related to Cosmos usage to [Scarf](https://about.scarf.sh/). This data assists the project maintainers in better understanding how Cosmos is used. Insights from this telemetry are critical for prioritizing patches, minor releases, and security fixes. Additionally, this information supports critical decisions related to the development road map. Deployments and individual users can opt out of analytics by setting the configuration: ``` [cosmos] enable_telemetry: False ``` As described in the [official documentation](https://docs.scarf.sh/gateway/#do-not-track), it is also possible to opt-out by setting one of the following environment variables: ```commandline AIRFLOW__COSMOS__ENABLE_TELEMETRY=False DO_NOT_TRACK=True SCARF_NO_ANALYTICS=True ``` In addition to Scarf's default data collection, Cosmos collects the following information when running Cosmos-powered DAGs: - Cosmos version - Airflow version - Python version - Operating system & machine architecture - Event type - DAG hash - Total tasks - Total Cosmos tasks No user-identifiable information (IP included) is stored in Scarf, even though Scarf infers information from the IP, such as location, and stores that. The data collection is GDPR compliant. The Apache Foundation supports this same strategy in many of its OpenSource projects, including Airflow ([#39510](apache/airflow#39510)). Example of visualisation of the data via the Scarf UI: <img width="1235" alt="Screenshot 2024-12-19 at 10 22 59" src="https://github.com/user-attachments/assets/12b9fbd4-2fdd-4e62-9876-defee3c4d8da" /> <img width="1231" alt="Screenshot 2024-12-19 at 10 23 13" src="https://github.com/user-attachments/assets/f98b849c-99be-4764-9e6d-cb7730da3688" /> <img width="1227" alt="Screenshot 2024-12-19 at 10 23 21" src="https://github.com/user-attachments/assets/421b7581-c641-422a-8469-252ba5a2fd33" /> <img width="1237" alt="Screenshot 2024-12-19 at 10 23 28" src="https://github.com/user-attachments/assets/2e5995a2-fe09-4017-a625-4dd4a60028d0" /> <img width="1248" alt="Screenshot 2024-12-19 at 10 23 51" src="https://github.com/user-attachments/assets/64a8a07f-df56-493c-a3f5-0f5165fd58e8" /> <img width="1229" alt="Screenshot 2024-12-19 at 10 24 01" src="https://github.com/user-attachments/assets/1e3e8b8d-b11d-4b31-8b46-853d541b01b8" /> <img width="1240" alt="Screenshot 2024-12-19 at 10 24 11" src="https://github.com/user-attachments/assets/b5e79cc7-4e2e-44b2-a94b-891b9226b152" /> <img width="1241" alt="Screenshot 2024-12-19 at 10 24 20" src="https://github.com/user-attachments/assets/2fb5d666-d749-416d-acf8-4a3bc94ba014" /> <img width="1234" alt="Screenshot 2024-12-19 at 10 24 31" src="https://github.com/user-attachments/assets/353eb82c-44d2-44ec-87e2-ace7138132f5" /> <img width="1245" alt="Screenshot 2024-12-19 at 10 24 39" src="https://github.com/user-attachments/assets/4a637a2a-14ad-41a8-b7fd-db186ec74357" /> <img width="1233" alt="Screenshot 2024-12-19 at 10 24 48" src="https://github.com/user-attachments/assets/bec4e2b0-49c3-4289-8f9b-3285db9ec40c" /> Closes: #1143
tatiana
added a commit
to astronomer/astronomer-cosmos
that referenced
this pull request
Dec 20, 2024
Export telemetry related to Cosmos usage to [Scarf](https://about.scarf.sh/). This data assists the project maintainers in better understanding how Cosmos is used. Insights from this telemetry are critical for prioritizing patches, minor releases, and security fixes. Additionally, this information supports critical decisions related to the development road map. Deployments and individual users can opt out of analytics by setting the configuration: ``` [cosmos] enable_telemetry: False ``` As described in the [official documentation](https://docs.scarf.sh/gateway/#do-not-track), it is also possible to opt-out by setting one of the following environment variables: ```commandline AIRFLOW__COSMOS__ENABLE_TELEMETRY=False DO_NOT_TRACK=True SCARF_NO_ANALYTICS=True ``` In addition to Scarf's default data collection, Cosmos collects the following information when running Cosmos-powered DAGs: - Cosmos version - Airflow version - Python version - Operating system & machine architecture - Event type - DAG hash - Total tasks - Total Cosmos tasks No user-identifiable information (IP included) is stored in Scarf, even though Scarf infers information from the IP, such as location, and stores that. The data collection is GDPR compliant. The Apache Foundation supports this same strategy in many of its OpenSource projects, including Airflow ([#39510](apache/airflow#39510)). Example of visualisation of the data via the Scarf UI: <img width="1235" alt="Screenshot 2024-12-19 at 10 22 59" src="https://github.com/user-attachments/assets/12b9fbd4-2fdd-4e62-9876-defee3c4d8da" /> <img width="1231" alt="Screenshot 2024-12-19 at 10 23 13" src="https://github.com/user-attachments/assets/f98b849c-99be-4764-9e6d-cb7730da3688" /> <img width="1227" alt="Screenshot 2024-12-19 at 10 23 21" src="https://github.com/user-attachments/assets/421b7581-c641-422a-8469-252ba5a2fd33" /> <img width="1237" alt="Screenshot 2024-12-19 at 10 23 28" src="https://github.com/user-attachments/assets/2e5995a2-fe09-4017-a625-4dd4a60028d0" /> <img width="1248" alt="Screenshot 2024-12-19 at 10 23 51" src="https://github.com/user-attachments/assets/64a8a07f-df56-493c-a3f5-0f5165fd58e8" /> <img width="1229" alt="Screenshot 2024-12-19 at 10 24 01" src="https://github.com/user-attachments/assets/1e3e8b8d-b11d-4b31-8b46-853d541b01b8" /> <img width="1240" alt="Screenshot 2024-12-19 at 10 24 11" src="https://github.com/user-attachments/assets/b5e79cc7-4e2e-44b2-a94b-891b9226b152" /> <img width="1241" alt="Screenshot 2024-12-19 at 10 24 20" src="https://github.com/user-attachments/assets/2fb5d666-d749-416d-acf8-4a3bc94ba014" /> <img width="1234" alt="Screenshot 2024-12-19 at 10 24 31" src="https://github.com/user-attachments/assets/353eb82c-44d2-44ec-87e2-ace7138132f5" /> <img width="1245" alt="Screenshot 2024-12-19 at 10 24 39" src="https://github.com/user-attachments/assets/4a637a2a-14ad-41a8-b7fd-db186ec74357" /> <img width="1233" alt="Screenshot 2024-12-19 at 10 24 48" src="https://github.com/user-attachments/assets/bec4e2b0-49c3-4289-8f9b-3285db9ec40c" /> Closes: #1143
kosteev
pushed a commit
to GoogleCloudPlatform/composer-airflow
that referenced
this pull request
May 6, 2025
Similar to apache/superset#25065 and apache/airflow#39510 but this one doesn't need to wait for the VOTE since this embed it to Readme. More detail in https://docs.scarf.sh/web-traffic/ All of this data will be available to interested PMC members, and reported periodically in things like Town Hall & newsletters. GitOrigin-RevId: 08a8028faefff2ed1002291b4bf6e522e8e6ed0f
potiuk
pushed a commit
to apache/airflow-site
that referenced
this pull request
May 8, 2025
Similar to apache/superset#25065 and apache/airflow#39510 but this one doesn't need to wait for the VOTE since this adds a simple transparent tracking pixel. More detail in https://docs.scarf.sh/web-traffic/ All of this data will be available in Scarf, creds for which are shared to 1password for PMC members, and can be reported periodically in things like Town Hall or newsletters.
kosteev
pushed a commit
to GoogleCloudPlatform/composer-airflow
that referenced
this pull request
May 26, 2025
Similar to apache/superset#25065 and apache/airflow#39510 but this one doesn't need to wait for the VOTE since this embed it to Readme. More detail in https://docs.scarf.sh/web-traffic/ All of this data will be available to interested PMC members, and reported periodically in things like Town Hall & newsletters. GitOrigin-RevId: 08a8028faefff2ed1002291b4bf6e522e8e6ed0f
kosteev
pushed a commit
to GoogleCloudPlatform/composer-airflow
that referenced
this pull request
Sep 21, 2025
Similar to apache/superset#25065 and apache/airflow#39510 but this one doesn't need to wait for the VOTE since this embed it to Readme. More detail in https://docs.scarf.sh/web-traffic/ All of this data will be available to interested PMC members, and reported periodically in things like Town Hall & newsletters. GitOrigin-RevId: 08a8028faefff2ed1002291b4bf6e522e8e6ed0f
kosteev
pushed a commit
to GoogleCloudPlatform/composer-airflow
that referenced
this pull request
Oct 19, 2025
Similar to apache/superset#25065 and apache/airflow#39510 but this one doesn't need to wait for the VOTE since this embed it to Readme. More detail in https://docs.scarf.sh/web-traffic/ All of this data will be available to interested PMC members, and reported periodically in things like Town Hall & newsletters. GitOrigin-RevId: 08a8028faefff2ed1002291b4bf6e522e8e6ed0f
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area:CLI
area:UI
Related to UI/UX. For Frontend Developers.
area:webserver
Webserver related Issues
kind:documentation
type:new-feature
Changelog: New Features
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
To take the discussion about adding Telemetry forward, I am creating a draft PR that adds some basic telemetry to send to Scarf.
Voting thread: https://lists.apache.org/thread/h1x2glvnd42rbj2q2rgpfo3pjhmpt307
I have added docs on the data collection as well as a way to opt-out of it.
Telemetry added at:
Data collected:
My proposal will be that this data will only be provided to PMC members on request to start with. For now, I am using the trial version.
We should also add it to the Airflow website similar to apache/superset#25639.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.