Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added airflow-core/docs/img/airflow-2-arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added airflow-core/docs/img/airflow-3-arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
37 changes: 37 additions & 0 deletions airflow-core/docs/installation/upgrading_to_airflow3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,43 @@ Upgrading to Airflow 3

Apache Airflow 3 is a major release and contains :ref:`breaking changes<breaking-changes>`. This guide walks you through the steps required to upgrade from Airflow 2.x to Airflow 3.0.

Understanding Airflow 3.x Architecture Changes
-----------------------------------------------

Airflow 3.x introduces significant architectural changes that improve security, scalability, and maintainability. Understanding these changes helps you prepare for the upgrade and adapt your workflows accordingly.

Airflow 2.x Architecture
^^^^^^^^^^^^^^^^^^^^^^^^
.. image:: ../img/airflow-2-arch.png
:alt: Airflow 2.x architecture diagram showing scheduler, metadata database, and worker
:align: center

- All components communicate directly with the Airflow metadata database.
- Airflow 2 was designed to run all components within the same network space: task code and task execution code (airflow package code that runs user code) run in the same process.
- Workers communicate directly with the Airflow database and execute all user code.
- User code could import sessions and perform malicious actions on the Airflow metadata database.
- The number of connections to the database was excessive, leading to scaling challenges.

Airflow 3.x Architecture
^^^^^^^^^^^^^^^^^^^^^^^^
.. image:: ../img/airflow-3-arch.png
:alt: Airflow 3.x architecture diagram showing the decoupled Execution API Server and worker subprocesses
:align: center

- The API server is currently the sole access point for the metadata DB for tasks and workers.
- It supports several applications: the Airflow REST API, an internal API for the Airflow UI that hosts static JS, and an API for workers to interact with when executing TIs via the task execution interface.
- Workers communicate with the API server instead of directly with the database.
- DAG processor and Triggerer utilize the task execution mechanism for their tasks, especially when they require variables or connections.

Database Access Restrictions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In Airflow 3, direct metadata database access from task code is now restricted. This is a key security and architectural improvement that affects how DAG authors interact with Airflow resources:

- **No Direct Database Access**: Task code can no longer directly import and use Airflow database sessions or models.
- **API-Based Resource Access**: All runtime interactions (state transitions, heartbeats, XComs, and resource fetching) are handled through a dedicated Task Execution API.
- **Enhanced Security**: This ensures isolation and security by preventing malicious task code from accessing or modifying the Airflow metadata database.
- **Stable Interface**: The Task SDK provides a stable, forward-compatible interface for accessing Airflow resources without direct database dependencies.

Step 1: Take care of prerequisites
----------------------------------

Expand Down
2 changes: 2 additions & 0 deletions docs/spelling_wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,7 @@ buildType
burstable
bytestring
cacert
Cadwyn
callables
camelCase
Cancelled
Expand Down Expand Up @@ -1550,6 +1551,7 @@ sagemaker
salesforce
samesite
saml
sandboxed
sanitization
sas
Sasl
Expand Down
73 changes: 73 additions & 0 deletions task-sdk/docs/concepts.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

.. http://www.apache.org/licenses/LICENSE-2.0

.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.

Concepts
========

This section covers the fundamental concepts that DAG authors need to understand when working with the Task SDK.

.. note::

For information about Airflow 3.x architectural changes and database access restrictions, see the "Upgrading to Airflow 3" guide in the main Airflow documentation.

Terminology
-----------
- **Task**: a Python function (decorated with ``@task``) or Operator invocation representing a unit of work in a DAG.
- **Task Execution**: the runtime machinery that executes user tasks in isolated subprocesses, managed via the Supervisor and Execution API.

Task Lifecycle
--------------

Understanding the task lifecycle helps DAG authors write more effective tasks and debug issues:

- **Scheduled**: The Airflow scheduler enqueues the task instance. The Executor assigns a workload token used for subsequent API authentication and validation with the Airflow API Server.
- **Queued**: Workers poll the queue to retrieve and reserve queued task instances.
- **Subprocess Launch**: The worker's Supervisor process spawns a dedicated subprocess (Task Runner) for the task instance, isolating its execution.
- **Run API Call**: The Supervisor sends a ``POST /run`` call to the Execution API to mark the task as running; the API server responds with a ``TIRunContext`` containing essential runtime information including:

- **``dag_run``**: Complete DAG run information (logical date, data intervals, configuration, etc.)
- **``max_tries``**: Maximum number of retry attempts allowed for this task instance
- **``should_retry``**: Boolean flag indicating whether the task should enter retry state or fail immediately on error
- **``task_reschedule_count``**: Number of times this task has been rescheduled
- **``variables``**: List of Airflow variables accessible to the task instance
- **``connections``**: List of Airflow connections accessible to the task instance
- **``upstream_map_indexes``**: Mapping of upstream task IDs to their map indexes for dynamic task mapping scenarios
- **``next_method``**: Method name to call when resuming from a deferred state (set when task resumes from a trigger)
- **``next_kwargs``**: Arguments to pass to the ``next_method`` (can be encrypted for sensitive data)
- **``xcom_keys_to_clear``**: List of XCom keys that need to be cleared and purged by the worker
- **Runtime Dependency Fetching**: During execution, if the task code requests Airflow resources (variables, connections, etc.), it writes a request to STDOUT. The Supervisor receives it and issues a corresponding API call, and writes the API response into the subprocess's STDIN.
- **Heartbeats & Token Renewal**: The Task Runner periodically emits ``POST /heartbeat`` calls through the Supervisor. Each call authenticates via JWT; if the token has expired, the API server returns a refreshed token in the ``Refreshed-API-Token`` header.
- **XCom Operations**: Upon successful task completion (or when explicitly invoked during execution), the Supervisor issues API calls to set or clear XCom entries for inter-task data passing.
- **State Patch**: When the task reaches a terminal (success/failed), deferred, or rescheduled state, the Supervisor invokes ``PATCH /state`` with the final task status and metadata.

Supervisor & Task Runner
------------------------

Within an Airflow worker, a Supervisor process manages the execution of task instances:

- Spawns isolated subprocesses (Task Runners) for each task, following a parent–child model.
- Establishes dedicated STDIN, STDOUT, and log pipes to communicate with each subprocess.
- Proxies Execution API calls: forwards subprocess requests (e.g., variables, connections, XCom operations, state transitions) to the API server and relays responses.
- Monitors subprocess liveness via heartbeats and marks tasks as failed if heartbeats are missed.
- Generates and refreshes JWT tokens on behalf of subprocesses through heartbeat responses to ensure authenticated API calls.

A Task Runner subprocess provides a sandboxed environment where user task code runs:

- Receives startup messages (run parameters) via STDIN from the Supervisor.
- Executes the Python function or operator code in isolation.
- Emits logs through STDOUT and communicates runtime events (heartbeats, XCom messages) via the Supervisor.
- Performs final state transitions by sending authenticated API calls through the Supervisor.
22 changes: 21 additions & 1 deletion task-sdk/docs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,16 @@ Examples
========

.. note:: For a minimal quick start, see the `Getting Started <../index.rst#getting-started>`_ section.
(Includes a simplest DAG example for first-time users.)

Key Concepts
------------
Defining DAGs
~~~~~~~~~~~~~

Example: Defining a DAG

Use the :func:`airflow.sdk.dag` decorator to convert a Python function into an Airflow DAG. All nested calls to :func:`airflow.sdk.task` within the function will become tasks in the DAG. For full parameters and usage, see the API reference for :func:`airflow.sdk.dag`.

.. exampleinclude:: ../../airflow-core/src/airflow/example_dags/example_dag_decorator.py
:language: python
:start-after: [START dag_decorator_usage]
Expand All @@ -35,6 +38,13 @@ Defining DAGs
Decorators
~~~~~~~~~~

Example: Using Task SDK decorators

The Task SDK provides decorators to simplify DAG definitions:

- :func:`airflow.sdk.task_group` groups related tasks into logical TaskGroups.
- :func:`airflow.sdk.setup` and :func:`airflow.sdk.teardown` define setup and teardown hooks for DAGs or TaskGroups.

.. exampleinclude:: ../../airflow-core/src/airflow/example_dags/example_task_group_decorator.py
:language: python
:start-after: [START howto_task_group_decorator]
Expand All @@ -50,6 +60,10 @@ Decorators
Tasks and Operators
~~~~~~~~~~~~~~~~~~~

Example: Defining tasks and using operators

Use the :func:`airflow.sdk.task` decorator to wrap Python callables as tasks and leverage dynamic task mapping with the ``.expand()`` method. Tasks communicate via :class:`airflow.sdk.XComArg`. For traditional operators and sensors, import classes like :class:`airflow.sdk.BaseOperator` or :class:`airflow.sdk.Sensor`.

.. exampleinclude:: ../../airflow-core/src/airflow/example_dags/example_dynamic_task_mapping.py
:language: python
:start-after: [START example_dynamic_task_mapping]
Expand All @@ -65,6 +79,10 @@ Tasks and Operators
Assets
~~~~~~

Example: Defining and aliasing assets

Model data artifacts using the Task SDK's asset API. Decorate functions with :func:`airflow.sdk.asset` and create aliases with :class:`airflow.sdk.AssetAlias`. See the API reference under assets for full guidance.

.. exampleinclude:: ../../airflow-core/src/airflow/example_dags/example_assets.py
:language: python
:start-after: [START asset_def]
Expand All @@ -86,6 +104,8 @@ see the `core TaskFlow tutorial <../../airflow-core/docs/tutorial/taskflow.rst>`
Step 1: Define the DAG
----------------------

In this step, define your DAG by applying the :func:`airflow.sdk.dag` decorator to a Python function. This registers the DAG with its schedule and default arguments. For more details, see :func:`airflow.sdk.dag`.

.. exampleinclude:: ../../airflow-core/src/airflow/example_dags/tutorial_taskflow_api.py
:language: python
:start-after: [START instantiate_dag]
Expand Down
Binary file added task-sdk/docs/img/airflow-2-approach.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added task-sdk/docs/img/airflow-2-arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added task-sdk/docs/img/airflow-3-arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added task-sdk/docs/img/airflow-3-task-sdk.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
131 changes: 99 additions & 32 deletions task-sdk/docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,27 +17,31 @@

Apache Airflow Task SDK
=================================
:any:`DAG` is where to start. :any:`dag`

The Apache Airflow Task SDK provides python-native interfaces for defining DAGs,
executing tasks in isolated subprocesses and interacting with Airflow resources
(e.g., Connections, Variables, XComs, Metrics, Logs, and OpenLineage events) at runtime.
It also includes core execution-time components to manage communication between the worker
and the Airflow scheduler/backend.

This approach reduces boilerplate and keeps your DAG definitions concise and readable.
The goal of task-sdk is to decouple DAG authoring from Airflow internals (Scheduler, API Server, etc.), providing a forward-compatible, stable interface for writing and maintaining DAGs across Airflow versions. This approach reduces boilerplate and keeps your DAG definitions concise and readable.

1. Introduction and Getting Started
-----------------------------------

Below is a quick introduction and installation guide to get started with the Task SDK.


Installation
------------
^^^^^^^^^^^^
To install the Task SDK, run:

.. code-block:: bash

pip install apache-airflow-task-sdk

Getting Started
---------------
^^^^^^^^^^^^^^^
Define a basic DAG and task in just a few lines of Python:

.. exampleinclude:: ../../airflow-core/src/airflow/example_dags/example_simplest_dag.py
Expand All @@ -46,47 +50,110 @@ Define a basic DAG and task in just a few lines of Python:
:end-before: [END simplest_dag]
:caption: Simplest DAG with :func:`@dag <airflow.sdk.dag>` and :func:`@task <airflow.sdk.task>`

Examples
--------
2. Public Interface
-------------------

Direct metadata database access from task code is now restricted. A dedicated Task Execution API handles all runtime interactions (state transitions, heartbeats, XComs, and resource fetching), ensuring isolation and security.

Airflow now supports a service-oriented architecture, enabling tasks to be executed remotely via a new Task Execution API. This API decouples task execution from the scheduler and introduces a stable contract for running tasks outside of Airflow's traditional runtime environment.

To support remote execution, Airflow provides the Task SDK — a lightweight runtime environment for running Airflow tasks in external systems such as containers, edge environments, or other runtimes. This lays the groundwork for language-agnostic task execution and brings improved isolation, portability, and extensibility to Airflow-based workflows.

Airflow 3.0 also introduces a new ``airflow.sdk`` namespace that exposes the core authoring interfaces for defining DAGs and tasks. DAG authors should now import objects like :class:`airflow.sdk.DAG`, :func:`airflow.sdk.dag`, and :func:`airflow.sdk.task` from ``airflow.sdk`` rather than internal modules. This new namespace provides a stable, forward-compatible interface for DAG authoring across future versions of Airflow.

3. DAG Authoring Enhancements
-----------------------------

Writing your DAGs is now more consistent in Airflow 3.0. Use the stable :mod:`airflow.sdk` interface to define your workflows and tasks.

Why use ``airflow.sdk``?
^^^^^^^^^^^^^^^^^^^^^^^^
- Decouple your DAG definitions from Airflow internals (Scheduler, API Server, etc.)
- Enjoy a consistent API that won't break across Airflow upgrades
- Import only the classes and decorators you need, without installing the full Airflow core

**Key imports from airflow.sdk**

**Classes**

- :class:`airflow.sdk.Asset`
- :class:`airflow.sdk.BaseHook`
- :class:`airflow.sdk.BaseNotifier`
- :class:`airflow.sdk.BaseOperator`
- :class:`airflow.sdk.BaseOperatorLink`
- :class:`airflow.sdk.BaseSensorOperator`
- :class:`airflow.sdk.Connection`
- :class:`airflow.sdk.Context`
- :class:`airflow.sdk.DAG`
- :class:`airflow.sdk.EdgeModifier`
- :class:`airflow.sdk.Label`
- :class:`airflow.sdk.ObjectStoragePath`
- :class:`airflow.sdk.Param`
- :class:`airflow.sdk.TaskGroup`
- :class:`airflow.sdk.Variable`

**Decorators and helper functions**

- :func:`airflow.sdk.asset`
- :func:`airflow.sdk.dag`
- :func:`airflow.sdk.setup`
- :func:`airflow.sdk.task`
- :func:`airflow.sdk.task_group`
- :func:`airflow.sdk.teardown`
- :func:`airflow.sdk.chain`
- :func:`airflow.sdk.chain_linear`
- :func:`airflow.sdk.cross_downstream`
- :func:`airflow.sdk.get_current_context`
- :func:`airflow.sdk.get_parsing_context`

All DAGs must update their imports to refer to ``airflow.sdk`` instead of using internal Airflow modules directly. Deprecated legacy import paths, such as ``airflow.models.dag.DAG`` and ``airflow.decorator.task``, will be removed in a future version of Airflow. Some utilities and helper functions currently used from ``airflow.utils.*`` and other modules will gradually be migrated to the Task SDK over the next minor releases. These upcoming updates aim to completely separate DAG creation from internal Airflow services. DAG authors can look forward to continuous improvements to airflow.sdk, with no backwards-incompatible changes to their existing code.

Legacy imports (deprecated):

.. code-block:: python

# Airflow 2.x
from airflow.models import DAG
from airflow.decorators import task

Use instead:

.. code-block:: python

For more examples DAGs and patterns, see the :doc:`examples` page.
# Airflow 3.x
from airflow.sdk import DAG, task

Key Concepts
------------
Defining DAGs
~~~~~~~~~~~~~
Use ``@dag`` to convert a function into an Airflow DAG. All nested ``@task`` calls
become part of the workflow.
4. Example DAG References
-------------------------

Decorators
~~~~~~~~~~
Simplify task definitions using decorators:
Explore a variety of DAG examples and patterns in the :doc:`examples` page.

- :func:`@task <airflow.sdk.task>` : define tasks.
- :func:`@task_group <airflow.sdk.task_group>`: group related tasks into logical units.
- :func:`@setup <airflow.sdk.setup>` and :func:`@teardown <airflow.sdk.teardown>`: define setup and teardown tasks for DAGs and TaskGroups.
5. Concepts
-----------

Tasks and Operators
~~~~~~~~~~~~~~~~~~~
Wrap Python callables with :func:`@task <airflow.sdk.task>` to create tasks, leverage dynamic task mapping with
``.expand()``, and pass data via ``XComArg``. You can also create traditional Operators
(e.g., sensors) via classes imported from the SDK:
Discover the fundamental concepts that DAG authors need to understand when working with the Task SDK, including Airflow 2.x vs 3.x architectural differences, database access restrictions, and task lifecycle. For full details, see the :doc:`concepts` page.

- **BaseOperator**, **Sensor**, **OperatorLink**, **Notifier**, **XComArg**, etc.
(see the **api reference** section for details)
Airflow 2.x Architecture
^^^^^^^^^^^^^^^^^^^^^^^^
.. image:: img/airflow-2-approach.png
:alt: Airflow 2.x architecture diagram showing scheduler, metadata DB, and worker interactions
:align: center

Assets
~~~~~~
Model data as assets and emit them to downstream tasks with the SDK's asset library under
``airflow.sdk.definitions.asset``. You can use:
Architectural Decoupling: Task Execution Interface (Airflow 3.x)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. image:: img/airflow-3-task-sdk.png
:alt: Airflow 3.x Task Execution API architecture diagram showing Execution API Server and worker subprocesses
:align: center

- :func:`@asset <airflow.sdk.asset>`, :class:`~airflow.sdk.AssetAlias`, etc. (see the **api reference** section below)
6. API References
-----------------

Refer to :doc:`api` for the complete reference of all decorators and classes.
For the full public API reference, see the :doc:`api` page.

.. toctree::
:hidden:

examples
dynamic-task-mapping
api
concepts