Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/boring-cyborg.yml
Original file line number Diff line number Diff line change
Expand Up @@ -406,7 +406,7 @@ labelPRBasedOnFilePath:
- tests/jobs/test_triggerer_job.py
- tests/models/test_trigger.py
- tests/jobs/test_triggerer_job_logging.py
- providers/standard/tests/provider_tests/standard/triggers/**/*
- providers/standard/tests/unit/standard/triggers/**/*

area:Serialization:
- airflow/serialization/**/*
Expand Down
102 changes: 64 additions & 38 deletions contributing-docs/11_provider_packages.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Provider packages
Airflow is split into core and providers. They are delivered as separate packages:

* ``apache-airflow`` - core of Apache Airflow (there are few more sub-packages separated)
* ``apache-airflow-providers-*`` - More than 70 provider packages to communicate with external services
* ``apache-airflow-providers-*`` - More than 90 provider packages to communicate with external services

**The outline for this document in GitHub is available at top-right corner button (with 3-dots and 3 lines).**

Expand All @@ -48,17 +48,34 @@ This will synchronize all extras that you need for development and testing of Ai
dependencies including runtime dependencies. See `local virtualenv <../07_local_virtualenv.rst>`_ or the uv project
for more information.

.. note::

We are currently in the process of separating out providers to separate subprojects. This means that
"old" providers related code is split across multiple directories "providers", "docs" and that the
``pyproject.toml`` files for them are dynamically generated when provider is built. The "new" providers
have all the files stored in the same "subfolder" of "providers" folder (for example all "airbyte" related
code is stored in "providers/airbyte" and there is an airbyte "pyproject.toml" stored in that folder and
the project is effectively a separate python project. It will take a while to migrate all the providers
to the new structure, so you might see both structures in the repository for some time.

We have ``provider.yaml`` file in the provider's module of the ``providers``.
Each provider is a separate python project, with its own ``pyproject.toml`` file and similar structure:

.. code-block:: text
PROVIDER
|- pyproject.toml # project configuration
|- provider.yaml # additional metadata for provider
|- src
|. \- airflow.providers.PROVIDER
| \ # here are hooks, operators, sensors, transfers
|- docs # docs for provider are stored here
\- tests
| -unit
| | PROVIDER
| \ # here unit test code is present
| - integration
| | PROVIDER
| \ # here integration test code is present
\- system
| PROVIDER
\ # here system test code is present
PROVIDER is the name of the provider package. It might be single directory (google, amazon, smtp) or in some
cases we have a nested structure one level down (``apache/cassandra``, ``apache/druid``, ``microsoft/winrm``,
``common.io`` for example).

On top of the standard ``pyproject.toml`` file where we keep project information,
we have ``provider.yaml`` file in the provider's module of the ``providers``.

This file contains:

Expand All @@ -69,23 +86,14 @@ This file contains:
* list of connection types, extra-links, secret backends, auth backends, and logging handlers (useful to both
register them as they are needed by Airflow and to include them in documentation automatically).

In the old provider.yaml we also keep additional information there - list of dependencies, additional extras
and development dependencies for the provider, however in the new provider structure, this information is
kept in the standard way in the ``pyproject.toml`` file.

Note that the ``provider.yaml`` file is regenerated automatically when the provider is released so you should
not modify it - except updating dependencies, as your changes will be lost.

In the old providers, you should only update dependencies for the provider in the corresponding
``provider.yaml``, in the new providers you should update "dependencies", optional dependencies and "dev"
dependency group in the ``pyproject.toml`` file.
Eventually we might migrate ``provider.yaml`` fully to ``pyproject.toml`` file but it would require custom
``tool.airflow`` toml section to be added to the ``pyproject.toml`` file.

Eventually we might migrate ``provider.yaml`` fully to ``pyproject.toml`` file but that should be a separate
change after we migrate all the providers to "new" structure.

If you want to add dependencies to the provider, you should add them to the corresponding ``provider.yaml``
and Airflow pre-commits and package generation commands will keep those dependencies (including all comments)
when regenerating the ``pyproject.toml`` file.
If you want to add dependencies to the provider, you should add them to the corresponding ``pyproject.toml``
file.

Providers are not packaged together with the core when you build "apache-airflow" package.

Expand Down Expand Up @@ -125,10 +133,18 @@ Developing community managed provider packages
----------------------------------------------

While you can develop your own providers, Apache Airflow has 60+ providers that are managed by the community.
They are part of the same repository as Apache Airflow (we use ``monorepo`` approach where different
They are part of the same repository as Apache Airflow (we use monorepo approach where different
parts of the system are developed in the same repository but then they are packaged and released separately).
All the community-managed providers are in 'airflow/providers' folder and they are all sub-packages of
'airflow.providers' package. All the providers are available as ``apache-airflow-providers-<PROVIDER_ID>``
All the community-managed providers are in ``providers`` folder and their code is placed as sub-packages of
``airflow.providers`` package.

In order to allow the same packages to be present in different parts of the source tree, we are heavily
utilising `namespace packages <https://packaging.python.org/en/latest/guides/packaging-namespace-packages/>`_.
For now we have a bit of mixture of native (no ``__init__.py`` namespace packages) and pkgutil-style
namespace packages (with ``__init__.py`` and path extension) but we are moving
towards using only native namespace packages.

All the providers are available as ``apache-airflow-providers-<PROVIDER_ID>``
packages when installed by users, but when you contribute to providers you can work on airflow main
and install provider dependencies via ``editable`` extras (using uv workspace) - without
having to manage and install providers separately, you can easily run tests for the providers
Expand Down Expand Up @@ -241,12 +257,22 @@ The rules are as follows:
the providers are connected under common umbrella and they are also tightly coupled on the code level.

* Typical structure of provider package:
* example_dags -> example DAGs are stored here (used for documentation and System Tests)
* hooks -> hooks are stored here
* operators -> operators are stored here
* sensors -> sensors are stored here
* secrets -> secret backends are stored here
* transfers -> transfer operators are stored here
* src
* airflow.providers.PROVIDER
* hooks -> hooks are stored here
* operators -> operators are stored here
* sensors -> sensors are stored here
* secrets -> secret backends are stored here
* transfers -> transfer operators are stored here
* docs
* tests
* unit
* PROVIDER
* integration
* PROVIDER
* system
* PROVIDER
* example_dags -> example DAGs are stored here (used for documentation and System Tests)

* Module names do not contain word "hooks", "operators" etc. The right type comes from
the package. For example 'hooks.datastore' module contains DataStore hook and 'operators.datastore'
Expand Down Expand Up @@ -275,11 +301,11 @@ The rules are as follows:

* Secret Backend name follows the convention: ``<SecretEngine>Backend``.

* Tests are grouped in parallel packages under "tests.providers" top level package. Module name is usually
* Init Tests are grouped in parallel packages under "tests.providers" top level package. Module name is usually
``test_<object_to_test>.py``,

* System tests (not yet fully automated but allowing to run e2e testing of particular provider) are
named with _system.py suffix.
named with ``example_*`` prefix.

Documentation for the community managed providers
-------------------------------------------------
Expand All @@ -303,8 +329,8 @@ Well documented provider contains those:

You can see for example ``google`` provider which has very comprehensive documentation:

* `Documentation <../docs/apache-airflow-providers-google>`_
* `System tests/Example DAGs <../tests/system/providers>`_
* `Documentation <../../providers/google/docs>`_
* `System tests/Example DAGs <../providers/google/tests/system/google/>`_

Part of the documentation are example dags (placed in the ``tests/system`` folder). The reason why
they are in ``tests/system`` is because we are using the example dags for various purposes:
Expand Down
20 changes: 10 additions & 10 deletions dev/breeze/tests/test_selective_checks.py
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str, str], stderr: str):
pytest.param(
(
"airflow/api/file.py",
"providers/postgres/tests/provider_tests/postgres/file.py",
"providers/postgres/tests/unit/postgres/file.py",
),
{
"selected-providers-list-as-string": "amazon common.compat common.sql fab google openlineage "
Expand Down Expand Up @@ -343,7 +343,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str, str], stderr: str):
),
(
pytest.param(
("providers/apache/beam/tests/provider_tests/apache/beam/file.py",),
("providers/apache/beam/tests/unit/apache/beam/file.py",),
{
"selected-providers-list-as-string": "apache.beam common.compat google",
"all-python-versions": "['3.9']",
Expand Down Expand Up @@ -373,7 +373,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str, str], stderr: str):
),
(
pytest.param(
("providers/apache/beam/tests/provider_tests/apache/beam/file.py",),
("providers/apache/beam/tests/unit/apache/beam/file.py",),
{
"selected-providers-list-as-string": "apache.beam common.compat google",
"all-python-versions": "['3.9']",
Expand Down Expand Up @@ -406,7 +406,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str, str], stderr: str):
pytest.param(
(
"providers/apache/beam/tests/system/apache/beam/file.py",
"providers/apache/beam/tests/provider_tests/apache/beam/file.py",
"providers/apache/beam/tests/unit/apache/beam/file.py",
),
{
"selected-providers-list-as-string": "apache.beam common.compat google",
Expand Down Expand Up @@ -440,7 +440,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str, str], stderr: str):
pytest.param(
(
"providers/apache/beam/tests/system/apache/beam/file.py",
"providers/apache/beam/tests/provider_tests/apache/beam/file.py",
"providers/apache/beam/tests/unit/apache/beam/file.py",
),
{
"selected-providers-list-as-string": "apache.beam common.compat google",
Expand Down Expand Up @@ -535,7 +535,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str, str], stderr: str):
pytest.param(
(
"chart/aaaa.txt",
"providers/postgres/tests/provider_tests/postgres/file.py",
"providers/postgres/tests/unit/postgres/file.py",
),
{
"selected-providers-list-as-string": "amazon common.sql google openlineage pgvector postgres",
Expand Down Expand Up @@ -1384,7 +1384,7 @@ def test_expected_output_full_tests_needed(
pytest.param(
(
"chart/aaaa.txt",
"providers/google/tests/provider_tests/google/file.py",
"providers/google/tests/unit/google/file.py",
),
{
"all-python-versions": "['3.9']",
Expand All @@ -1411,7 +1411,7 @@ def test_expected_output_full_tests_needed(
(
"airflow/cli/test.py",
"chart/aaaa.txt",
"providers/google/tests/provider_tests/google/file.py",
"providers/google/tests/unit/google/file.py",
),
{
"all-python-versions": "['3.9']",
Expand All @@ -1436,7 +1436,7 @@ def test_expected_output_full_tests_needed(
pytest.param(
(
"airflow/file.py",
"providers/google/tests/provider_tests/google/file.py",
"providers/google/tests/unit/google/file.py",
),
{
"all-python-versions": "['3.9']",
Expand Down Expand Up @@ -1614,7 +1614,7 @@ def test_expected_output_push(
(
"airflow/cli/test.py",
"chart/aaaa.txt",
"providers/google/tests/provider_tests/google/file.py",
"providers/google/tests/unit/google/file.py",
),
{
"selected-providers-list-as-string": "amazon apache.beam apache.cassandra "
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
)

from airflow.providers.alibaba.cloud.hooks.analyticdb_spark import AnalyticDBSparkHook
from provider_tests.alibaba.cloud.utils.analyticdb_spark_mock import mock_adb_spark_hook_default_project_id
from unit.alibaba.cloud.utils.analyticdb_spark_mock import mock_adb_spark_hook_default_project_id

ADB_SPARK_STRING = "airflow.providers.alibaba.cloud.hooks.analyticdb_spark.{}"
MOCK_ADB_SPARK_CONN_ID = "mock_id"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
from unittest import mock

from airflow.providers.alibaba.cloud.hooks.oss import OSSHook
from provider_tests.alibaba.cloud.utils.oss_mock import mock_oss_hook_default_project_id
from unit.alibaba.cloud.utils.oss_mock import mock_oss_hook_default_project_id

OSS_STRING = "airflow.providers.alibaba.cloud.hooks.oss.{}"
MOCK_OSS_CONN_ID = "mock_id"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -139,9 +139,7 @@ def mock_conn(request):


class TestSessionFactory:
@conf_vars(
{("aws", "session_factory"): "provider_tests.amazon.aws.hooks.test_base_aws.CustomSessionFactory"}
)
@conf_vars({("aws", "session_factory"): "unit.amazon.aws.hooks.test_base_aws.CustomSessionFactory"})
def test_resolve_session_factory_class(self):
cls = resolve_session_factory()
assert issubclass(cls, CustomSessionFactory)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@
)

from airflow.providers.amazon.aws.hooks.eks import COMMAND, EksHook
from provider_tests.amazon.aws.utils.eks_test_constants import (
from unit.amazon.aws.utils.eks_test_constants import (
DEFAULT_CONN_ID,
DEFAULT_NAMESPACE,
DISK_SIZE,
Expand Down Expand Up @@ -82,7 +82,7 @@
RegExTemplates,
ResponseAttributes,
)
from provider_tests.amazon.aws.utils.eks_test_utils import (
from unit.amazon.aws.utils.eks_test_utils import (
attributes_to_test,
generate_clusters,
generate_dict,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from __future__ import annotations

from airflow.providers.amazon.aws.links.athena import AthenaQueryResultsLink
from provider_tests.amazon.aws.links.test_base_aws import BaseAwsLinksTestCase
from unit.amazon.aws.links.test_base_aws import BaseAwsLinksTestCase


class TestAthenaQueryResultsLink(BaseAwsLinksTestCase):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
BatchJobDetailsLink,
BatchJobQueueLink,
)
from provider_tests.amazon.aws.links.test_base_aws import BaseAwsLinksTestCase
from unit.amazon.aws.links.test_base_aws import BaseAwsLinksTestCase


class TestBatchJobDefinitionLink(BaseAwsLinksTestCase):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
ComprehendDocumentClassifierLink,
ComprehendPiiEntitiesDetectionLink,
)
from provider_tests.amazon.aws.links.test_base_aws import BaseAwsLinksTestCase
from unit.amazon.aws.links.test_base_aws import BaseAwsLinksTestCase


class TestComprehendPiiEntitiesDetectionLink(BaseAwsLinksTestCase):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from __future__ import annotations

from airflow.providers.amazon.aws.links.datasync import DataSyncTaskExecutionLink, DataSyncTaskLink
from provider_tests.amazon.aws.links.test_base_aws import BaseAwsLinksTestCase
from unit.amazon.aws.links.test_base_aws import BaseAwsLinksTestCase

TASK_ID = "task-0b36221bf94ad2bdd"
EXECUTION_ID = "exec-00000000000000004"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from __future__ import annotations

from airflow.providers.amazon.aws.links.ec2 import EC2InstanceDashboardLink, EC2InstanceLink
from provider_tests.amazon.aws.links.test_base_aws import BaseAwsLinksTestCase
from unit.amazon.aws.links.test_base_aws import BaseAwsLinksTestCase


class TestEC2InstanceLink(BaseAwsLinksTestCase):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
get_log_uri,
get_serverless_dashboard_url,
)
from provider_tests.amazon.aws.links.test_base_aws import BaseAwsLinksTestCase
from unit.amazon.aws.links.test_base_aws import BaseAwsLinksTestCase


class TestEmrClusterLink(BaseAwsLinksTestCase):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from __future__ import annotations

from airflow.providers.amazon.aws.links.glue import GlueJobRunDetailsLink
from provider_tests.amazon.aws.links.test_base_aws import BaseAwsLinksTestCase
from unit.amazon.aws.links.test_base_aws import BaseAwsLinksTestCase


class TestGlueJobRunDetailsLink(BaseAwsLinksTestCase):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from __future__ import annotations

from airflow.providers.amazon.aws.links.logs import CloudWatchEventsLink
from provider_tests.amazon.aws.links.test_base_aws import BaseAwsLinksTestCase
from unit.amazon.aws.links.test_base_aws import BaseAwsLinksTestCase


class TestCloudWatchEventsLink(BaseAwsLinksTestCase):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from __future__ import annotations

from airflow.providers.amazon.aws.links.sagemaker import SageMakerTransformJobLink
from provider_tests.amazon.aws.links.test_base_aws import BaseAwsLinksTestCase
from unit.amazon.aws.links.test_base_aws import BaseAwsLinksTestCase


class TestSageMakerTransformDetailsLink(BaseAwsLinksTestCase):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
StateMachineDetailsLink,
StateMachineExecutionsDetailsLink,
)
from provider_tests.amazon.aws.links.test_base_aws import BaseAwsLinksTestCase
from unit.amazon.aws.links.test_base_aws import BaseAwsLinksTestCase


class TestStateMachineDetailsLink(BaseAwsLinksTestCase):
Expand Down
Loading
Loading