Skip to content

Conversation

@kaxil
Copy link
Member

@kaxil kaxil commented May 9, 2024

To take the discussion about adding Telemetry forward, I am creating a draft PR that adds some basic telemetry to send to Scarf.

Voting thread: https://lists.apache.org/thread/h1x2glvnd42rbj2q2rgpfo3pjhmpt307

I have added docs on the data collection as well as a way to opt-out of it.

Telemetry added at:

Data collected:

  • Airflow version
  • Python version
  • Platform System info: Linux/Darwin
  • Machine type: arm64/aarch64
  • Airflow Metadata DB: Postgres/MySQL
  • DB: sqlite/postgres
  • DB version: 12.6.3
  • Number of DAGs
image image image image image

My proposal will be that this data will only be provided to PMC members on request to start with. For now, I am using the trial version.

We should also add it to the Airflow website similar to apache/superset#25639.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:CLI area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues kind:documentation labels May 9, 2024
@kaxil kaxil force-pushed the scarf branch 5 times, most recently from 0479db7 to b16caed Compare May 9, 2024 20:28
kaxil added a commit to astronomer/airflow that referenced this pull request May 9, 2024
Similar to apache/superset#25065 and apache#39510 but this one doesn't need to wait for the VOTE since this embed it to Readme. More detail in https://docs.scarf.sh/web-traffic/

All of this data will be available to interested PMC members, and reported periodically in things like Town Hall & newsletters.
@kaxil kaxil mentioned this pull request May 9, 2024
kaxil added a commit that referenced this pull request May 9, 2024
Similar to apache/superset#25065 and #39510 but this one doesn't need to wait for the VOTE since this embed it to Readme. More detail in https://docs.scarf.sh/web-traffic/

All of this data will be available to interested PMC members, and reported periodically in things like Town Hall & newsletters.
@kaxil kaxil force-pushed the scarf branch 4 times, most recently from 200f210 to 7d8eb2d Compare May 9, 2024 23:57
kaxil added a commit to apache/airflow-site that referenced this pull request May 10, 2024
Similar to apache/superset#25065 and apache/airflow#39510 but this one doesn't need to wait for the VOTE since this adds a simple transparent tracking pixel. More detail in https://docs.scarf.sh/web-traffic/

All of this data will be available in Scarf, creds for which are shared to 1password for PMC members, and can be reported periodically in things like Town Hall or newsletters.
kaxil added a commit to apache/airflow-site that referenced this pull request May 10, 2024
Similar to apache/superset#25065 and apache/airflow#39510 but this one doesn't need to wait for the VOTE since this adds a simple transparent tracking pixel. More detail in https://docs.scarf.sh/web-traffic/

All of this data will be available in Scarf, creds for which are shared to 1password for PMC members, and can be reported periodically in things like Town Hall or newsletters.
@kaxil kaxil merged commit cd0c6a7 into apache:main May 16, 2024
@kaxil kaxil deleted the scarf branch May 16, 2024 20:56
@utkarsharma2 utkarsharma2 added the type:new-feature Changelog: New Features label Jun 3, 2024
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Jul 26, 2024
Similar to apache/superset#25065 and apache#39510 but this one doesn't need to wait for the VOTE since this embed it to Readme. More detail in https://docs.scarf.sh/web-traffic/

All of this data will be available to interested PMC members, and reported periodically in things like Town Hall & newsletters.
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Jul 26, 2024
To take [the discussion about adding Telemetry](https://lists.apache.org/thread/7f6qyr8w2n8w34g63s7ybhzphgt8h43m) forward, I am creating a draft PR that adds some basic telemetry to send to Scarf. 

Voting thread: https://lists.apache.org/thread/h1x2glvnd42rbj2q2rgpfo3pjhmpt307

I have added docs on the data collection as well as a way to opt-out of it. 

Telemetry added at:
- Scheduler startup [Custom telemetry](https://docs.scarf.sh/custom-telemetry/) similar to other popular projects like [Unstructured](Unstructured-IO/unstructured@f0a63e2)
- Webserver via a [tracking pixel](https://docs.scarf.sh/web-traffic/#creating-a-pixel), similar to [Apache Superset](https://github.com/apache/superset/pull/26011/files)

Data collected:
- Airflow version
- Python version
- Platform System info: Linux/Darwin
- Machine type: arm64/aarch64
- Airflow Metadata DB: Postgres/MySQL
- DB: sqlite/postgres
- DB version: 12.6.3
- Number of DAGs
tatiana added a commit to astronomer/dag-factory that referenced this pull request Oct 17, 2024
Export telemetry related to DAG Factory usage to
[Scarf](https://about.scarf.sh/).

This data assists the project maintainers in better understanding how
DAG Factory is used. Insights from this telemetry are critical for
prioritizing patches, minor releases, and security fixes. Additionally,
this information supports critical decisions related to the development
road map.

Deployments and individual users can opt out of analytics by setting the
configuration:

```
[dag_factory] enable_telemetry False
```

As described in the [official
documentation](https://docs.scarf.sh/gateway/#do-not-track), it is also
possible to opt-out by setting one of the following environment
variables:

```commandline
AIRFLOW__DAG_FACTORY__ENABLE_TELEMETRY=False
DO_NOT_TRACK=True
SCARF_NO_ANALYTICS=True
```

In addition to Scarf's default data collection, DAG Factory collects the
following information:

- DAG Factory version
- Airflow version
- Python version
- Operating system & machine architecture
- Event type
- Number of DAGs
- Number of TaskGroups
- Number of Tasks

No user-identifiable information (IP included) is stored in Scarf, even
though Scarf infers information from the IP, such as location, and
stores that. The data collection is GDPR compliant.

The data is not currently being emitted for pre-releases except from
integration tests.

The Apache Foundation supports this same strategy in many of its
OpenSource projects, including Airflow
([#39510](apache/airflow#39510)).

Example of visualisation of the data via the Scarf UI:

<img width="1624" alt="Screenshot 2024-10-17 at 01 56 09"
src="https://github.com/user-attachments/assets/d4191834-1e02-4192-811b-125d3fa735fe">

<img width="1624" alt="Screenshot 2024-10-17 at 01 55 59"
src="https://github.com/user-attachments/assets/cd814e11-7f77-45c8-95a0-56e29d9f9f12">

<img width="1624" alt="Screenshot 2024-10-17 at 01 55 47"
src="https://github.com/user-attachments/assets/2950ddbb-ea25-415f-b61c-3fbdcf4fc739">

<img width="1624" alt="Screenshot 2024-10-17 at 01 55 42"
src="https://github.com/user-attachments/assets/a56ecefd-0cd0-486c-9faf-026b1e9a4ceb">

Closes: #214
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Nov 9, 2024
Similar to apache/superset#25065 and apache/airflow#39510 but this one doesn't need to wait for the VOTE since this embed it to Readme. More detail in https://docs.scarf.sh/web-traffic/

All of this data will be available to interested PMC members, and reported periodically in things like Town Hall & newsletters.

GitOrigin-RevId: 08a8028faefff2ed1002291b4bf6e522e8e6ed0f
tatiana added a commit to astronomer/astronomer-cosmos that referenced this pull request Dec 20, 2024
Export telemetry related to Cosmos usage to
[Scarf](https://about.scarf.sh/).

This data assists the project maintainers in better understanding how
Cosmos is used. Insights from this telemetry are critical for
prioritizing patches, minor releases, and security fixes. Additionally,
this information supports critical decisions related to the development
road map.

Deployments and individual users can opt out of analytics by setting the
configuration:

```
[cosmos]
enable_telemetry: False
```

As described in the [official
documentation](https://docs.scarf.sh/gateway/#do-not-track), it is also
possible to opt-out by setting one of the following environment
variables:

```commandline
AIRFLOW__COSMOS__ENABLE_TELEMETRY=False
DO_NOT_TRACK=True
SCARF_NO_ANALYTICS=True
```

In addition to Scarf's default data collection, Cosmos collects the
following information when running Cosmos-powered DAGs:

- Cosmos version
- Airflow version
- Python version
- Operating system & machine architecture
- Event type
- DAG hash
- Total tasks
- Total Cosmos tasks

No user-identifiable information (IP included) is stored in Scarf, even
though Scarf infers information from the IP, such as location, and
stores that. The data collection is GDPR compliant.

The Apache Foundation supports this same strategy in many of its
OpenSource projects, including Airflow
([#39510](apache/airflow#39510)).

Example of visualisation of the data via the Scarf UI:

<img width="1235" alt="Screenshot 2024-12-19 at 10 22 59"
src="https://github.com/user-attachments/assets/12b9fbd4-2fdd-4e62-9876-defee3c4d8da"
/>

<img width="1231" alt="Screenshot 2024-12-19 at 10 23 13"
src="https://github.com/user-attachments/assets/f98b849c-99be-4764-9e6d-cb7730da3688"
/>

<img width="1227" alt="Screenshot 2024-12-19 at 10 23 21"
src="https://github.com/user-attachments/assets/421b7581-c641-422a-8469-252ba5a2fd33"
/>

<img width="1237" alt="Screenshot 2024-12-19 at 10 23 28"
src="https://github.com/user-attachments/assets/2e5995a2-fe09-4017-a625-4dd4a60028d0"
/>

<img width="1248" alt="Screenshot 2024-12-19 at 10 23 51"
src="https://github.com/user-attachments/assets/64a8a07f-df56-493c-a3f5-0f5165fd58e8"
/>

<img width="1229" alt="Screenshot 2024-12-19 at 10 24 01"
src="https://github.com/user-attachments/assets/1e3e8b8d-b11d-4b31-8b46-853d541b01b8"
/>

<img width="1240" alt="Screenshot 2024-12-19 at 10 24 11"
src="https://github.com/user-attachments/assets/b5e79cc7-4e2e-44b2-a94b-891b9226b152"
/>

<img width="1241" alt="Screenshot 2024-12-19 at 10 24 20"
src="https://github.com/user-attachments/assets/2fb5d666-d749-416d-acf8-4a3bc94ba014"
/>

<img width="1234" alt="Screenshot 2024-12-19 at 10 24 31"
src="https://github.com/user-attachments/assets/353eb82c-44d2-44ec-87e2-ace7138132f5"
/>

<img width="1245" alt="Screenshot 2024-12-19 at 10 24 39"
src="https://github.com/user-attachments/assets/4a637a2a-14ad-41a8-b7fd-db186ec74357"
/>

<img width="1233" alt="Screenshot 2024-12-19 at 10 24 48"
src="https://github.com/user-attachments/assets/bec4e2b0-49c3-4289-8f9b-3285db9ec40c"
/>


Closes: #1143
tatiana added a commit to astronomer/astronomer-cosmos that referenced this pull request Dec 20, 2024
Export telemetry related to Cosmos usage to
[Scarf](https://about.scarf.sh/).

This data assists the project maintainers in better understanding how
Cosmos is used. Insights from this telemetry are critical for
prioritizing patches, minor releases, and security fixes. Additionally,
this information supports critical decisions related to the development
road map.

Deployments and individual users can opt out of analytics by setting the
configuration:

```
[cosmos]
enable_telemetry: False
```

As described in the [official
documentation](https://docs.scarf.sh/gateway/#do-not-track), it is also
possible to opt-out by setting one of the following environment
variables:

```commandline
AIRFLOW__COSMOS__ENABLE_TELEMETRY=False
DO_NOT_TRACK=True
SCARF_NO_ANALYTICS=True
```

In addition to Scarf's default data collection, Cosmos collects the
following information when running Cosmos-powered DAGs:

- Cosmos version
- Airflow version
- Python version
- Operating system & machine architecture
- Event type
- DAG hash
- Total tasks
- Total Cosmos tasks

No user-identifiable information (IP included) is stored in Scarf, even
though Scarf infers information from the IP, such as location, and
stores that. The data collection is GDPR compliant.

The Apache Foundation supports this same strategy in many of its
OpenSource projects, including Airflow
([#39510](apache/airflow#39510)).

Example of visualisation of the data via the Scarf UI:

<img width="1235" alt="Screenshot 2024-12-19 at 10 22 59"
src="https://github.com/user-attachments/assets/12b9fbd4-2fdd-4e62-9876-defee3c4d8da"
/>

<img width="1231" alt="Screenshot 2024-12-19 at 10 23 13"
src="https://github.com/user-attachments/assets/f98b849c-99be-4764-9e6d-cb7730da3688"
/>

<img width="1227" alt="Screenshot 2024-12-19 at 10 23 21"
src="https://github.com/user-attachments/assets/421b7581-c641-422a-8469-252ba5a2fd33"
/>

<img width="1237" alt="Screenshot 2024-12-19 at 10 23 28"
src="https://github.com/user-attachments/assets/2e5995a2-fe09-4017-a625-4dd4a60028d0"
/>

<img width="1248" alt="Screenshot 2024-12-19 at 10 23 51"
src="https://github.com/user-attachments/assets/64a8a07f-df56-493c-a3f5-0f5165fd58e8"
/>

<img width="1229" alt="Screenshot 2024-12-19 at 10 24 01"
src="https://github.com/user-attachments/assets/1e3e8b8d-b11d-4b31-8b46-853d541b01b8"
/>

<img width="1240" alt="Screenshot 2024-12-19 at 10 24 11"
src="https://github.com/user-attachments/assets/b5e79cc7-4e2e-44b2-a94b-891b9226b152"
/>

<img width="1241" alt="Screenshot 2024-12-19 at 10 24 20"
src="https://github.com/user-attachments/assets/2fb5d666-d749-416d-acf8-4a3bc94ba014"
/>

<img width="1234" alt="Screenshot 2024-12-19 at 10 24 31"
src="https://github.com/user-attachments/assets/353eb82c-44d2-44ec-87e2-ace7138132f5"
/>

<img width="1245" alt="Screenshot 2024-12-19 at 10 24 39"
src="https://github.com/user-attachments/assets/4a637a2a-14ad-41a8-b7fd-db186ec74357"
/>

<img width="1233" alt="Screenshot 2024-12-19 at 10 24 48"
src="https://github.com/user-attachments/assets/bec4e2b0-49c3-4289-8f9b-3285db9ec40c"
/>


Closes: #1143
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request May 6, 2025
Similar to apache/superset#25065 and apache/airflow#39510 but this one doesn't need to wait for the VOTE since this embed it to Readme. More detail in https://docs.scarf.sh/web-traffic/

All of this data will be available to interested PMC members, and reported periodically in things like Town Hall & newsletters.

GitOrigin-RevId: 08a8028faefff2ed1002291b4bf6e522e8e6ed0f
potiuk pushed a commit to apache/airflow-site that referenced this pull request May 8, 2025
Similar to apache/superset#25065 and apache/airflow#39510 but this one doesn't need to wait for the VOTE since this adds a simple transparent tracking pixel. More detail in https://docs.scarf.sh/web-traffic/

All of this data will be available in Scarf, creds for which are shared to 1password for PMC members, and can be reported periodically in things like Town Hall or newsletters.
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request May 26, 2025
Similar to apache/superset#25065 and apache/airflow#39510 but this one doesn't need to wait for the VOTE since this embed it to Readme. More detail in https://docs.scarf.sh/web-traffic/

All of this data will be available to interested PMC members, and reported periodically in things like Town Hall & newsletters.

GitOrigin-RevId: 08a8028faefff2ed1002291b4bf6e522e8e6ed0f
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 21, 2025
Similar to apache/superset#25065 and apache/airflow#39510 but this one doesn't need to wait for the VOTE since this embed it to Readme. More detail in https://docs.scarf.sh/web-traffic/

All of this data will be available to interested PMC members, and reported periodically in things like Town Hall & newsletters.

GitOrigin-RevId: 08a8028faefff2ed1002291b4bf6e522e8e6ed0f
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Oct 19, 2025
Similar to apache/superset#25065 and apache/airflow#39510 but this one doesn't need to wait for the VOTE since this embed it to Readme. More detail in https://docs.scarf.sh/web-traffic/

All of this data will be available to interested PMC members, and reported periodically in things like Town Hall & newsletters.

GitOrigin-RevId: 08a8028faefff2ed1002291b4bf6e522e8e6ed0f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:CLI area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues kind:documentation type:new-feature Changelog: New Features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants