-
Notifications
You must be signed in to change notification settings - Fork 16.6k
Description
Apache Airflow version
3.1.6
If "Other Airflow 3 version" selected, which one?
3.1.5
What happened?
Hi.
In Airflow 3.1.6 (and 3.1.5, issue doesn't exist on 3.1.3), GitDagBundle with LocalExecutor performs a full git clone into a new directory (new Inode) and deletes the old one almost every time a task starts.
This "Inode flipping" causes running tasks (e.g.: dbt via Cosmos) to lose their file descriptors to the DAG folder, resulting in FileNotFoundError or Directory not found errors.
I executed such script inside airflow scheduler pod to detect ongoing git operations:
python - <<'PY'
import os, time, subprocess
from datetime import datetime, timezone
TARGET = "/tmp/airflow/dag_bundles/dh-pipeline-dags/versions/<provide sha>"
INTERVAL = 0.05
def now():
return datetime.now(timezone.utc).strftime("%H:%M:%S.%f")[:-3]
seen_pids = set()
print(f"{now()} START monitoring GIT processes...")
while True:
try:
pids = [pid for pid in os.listdir('/proc') if pid.isdigit()]
for pid in pids:
if pid not in seen_pids:
try:
with open(f"/proc/{pid}/cmdline", "rb") as f:
cmd = f.read().replace(b"\x00", b" ").decode("utf-8", "ignore").strip()
if "git" in cmd.lower():
with open(f"/proc/{pid}/stat", "rb") as f:
stat_parts = f.read().split()
ppid = stat_parts[3].decode()
inode_info = "N/A"
if os.path.exists(TARGET):
try:
inode_info = os.stat(TARGET).st_ino
except: pass
print(f"{now()} GIT DETECTED! PID={pid} PPID={ppid} INODE_BASE={inode_info}")
print(f" CMD: {cmd[:150]}")
seen_pids.add(pid)
except (FileNotFoundError, ProcessLookupError):
continue
except Exception as e:
print(f"Error: {e}")
if len(seen_pids) > 1000:
seen_pids.clear()
time.sleep(INTERVAL)
PY
And this is what I got on 3.1.6:
...
13:33:06.182 GIT DETECTED! PID=16645 PPID=16640 INODE_BASE=139948383
CMD: /usr/lib/git-core/git rev-list --objects --stdin --not --all --quiet --alternate-refs
13:33:06.337 GIT DETECTED! PID=16648 PPID=16637 INODE_BASE=406416615
CMD: git clone -v -- /tmp/airflow/dag_bundles/dh-pipeline-dags/bare /tmp/airflow/dag_bundles/dh-pipeline-dags/versions/1108133dd6b2afb806d3531e9a97f8b15994
13:33:06.337 GIT DETECTED! PID=16649 PPID=16648 INODE_BASE=406416615
CMD: /bin/sh -c git-upload-pack '/tmp/airflow/dag_bundles/dh-pipeline-dags/bare' git-upload-pack '/tmp/airflow/dag_bundles/dh-pipeline-dags/bare'
13:33:06.337 GIT DETECTED! PID=16650 PPID=16649 INODE_BASE=406416615
CMD: git-upload-pack /tmp/airflow/dag_bundles/dh-pipeline-dags/bare
13:33:06.491 GIT DETECTED! PID=16652 PPID=16637 INODE_BASE=406416615
CMD: git cat-file --batch-check
13:33:06.491 GIT DETECTED! PID=16653 PPID=16637 INODE_BASE=406416615
CMD: git reset --hard HEAD --
13:33:09.825 GIT DETECTED! PID=16658 PPID=16655 INODE_BASE=406416615
CMD: git fetch -v -- origin +refs/heads/*:refs/heads/* +refs/tags/*:refs/tags/*
13:33:10.339 GIT DETECTED! PID=16666 PPID=16655 INODE_BASE=473240196
CMD: git clone -v -- /tmp/airflow/dag_bundles/dh-pipeline-dags/bare /tmp/airflow/dag_bundles/dh-pipeline-dags/versions/1108133dd6b2afb806d3531e9a97f8b15994
13:33:10.339 GIT DETECTED! PID=16667 PPID=16666 INODE_BASE=473240196
CMD: /bin/sh -c git-upload-pack '/tmp/airflow/dag_bundles/dh-pipeline-dags/bare' git-upload-pack '/tmp/airflow/dag_bundles/dh-pipeline-dags/bare'
13:33:10.339 GIT DETECTED! PID=16668 PPID=16667 INODE_BASE=473240196
CMD: git-upload-pack /tmp/airflow/dag_bundles/dh-pipeline-dags/bare
13:33:10.441 GIT DETECTED! PID=16669 PPID=16655 INODE_BASE=473240196
CMD: git checkout qa/deployed
...
git clone is happening and INODE_BASE is changing.
On 3.1.3 I got:
...
13:16:45.591 GIT DETECTED! PID=354362 PPID=354358 INODE_BASE=14307405
CMD: git checkout dev/deployed
13:16:46.580 GIT DETECTED! PID=354373 PPID=354366 INODE_BASE=14307405
CMD: git cat-file --batch-check
13:16:49.630 GIT DETECTED! PID=354397 PPID=354393 INODE_BASE=14307405
CMD: git checkout dev/deployed
13:16:49.990 GIT DETECTED! PID=354405 PPID=354400 INODE_BASE=14307405
CMD: git cat-file --batch-check
13:16:49.990 GIT DETECTED! PID=354406 PPID=354400 INODE_BASE=14307405
CMD: git reset --hard HEAD --
13:17:01.411 GIT DETECTED! PID=354439 PPID=354434 INODE_BASE=14307405
CMD: git cat-file --batch-check
13:17:01.411 GIT DETECTED! PID=354440 PPID=354434 INODE_BASE=14307405
CMD: git reset --hard HEAD --
13:17:04.472 GIT DETECTED! PID=354443 PPID=354441 INODE_BASE=14307405
CMD: git version
...
No git-clone's and INODE_BASE is not changing.
Outcome: Tasks are failing frequently. Example logs of failed tasks:
[2026-02-03 14:50:02] ERROR - Failed to import: /tmp/airflow/dag_bundles/dh-pipeline-dags/versions/1108133dd6b2afb806d3531e9a97f8b15994d812/dags/pipelines/transform_uk_flood_monitoring.py
CosmosLoadDbtException: Unable to run ['/home/airflow/.local/bin/dbt', 'ls', '--output', 'json', '--output-keys', 'name', 'unique_id', 'resource_type', 'depends_on', 'original_file_path', 'tags', 'config', 'freshness', '--project-dir', '/tmp/tmpv45sxgtr', '--profiles-dir', '/tmp/airflow/dag_bundles/dh-pipeline-dags/versions/1108133dd6b2afb806d3531e9a97f8b15994d812/dbt', '--profile', 'xc_dh_extdata_model', '--target', 'xcloud', '--select', 'tag:live_flood_monitoring_readings,tag:raw'] due to the error:
stderr: Path '/tmp/airflow/dag_bundles/dh-pipeline-dags/versions/1108133dd6b2afb806d3531e9a97f8b15994d812/dbt' does not exist.
[2026-02-03 15:09:24] ERROR - Failed to import: /tmp/airflow/dag_bundles/dh-pipeline-dags/versions/1108133dd6b2afb806d3531e9a97f8b15994d812/dags/pipelines/transform_storm_overflow.py
CosmosLoadDbtException: Unable to run ['/home/airflow/.local/bin/dbt', 'ls', '--output', 'json', '--output-keys', 'name', 'unique_id', 'resource_type', 'depends_on', 'original_file_path', 'tags', 'config', 'freshness', '--project-dir', '/tmp/tmpm092liya', '--profiles-dir', '/tmp/airflow/dag_bundles/dh-pipeline-dags/versions/1108133dd6b2afb806d3531e9a97f8b15994d812/dbt', '--profile', 'xc_dh_extdata_model', '--target', 'xcloud', '--select', '+tag:storm_overflow,+tag:std'] due to the error:
stderr: [Errno 2] No such file or directory: '/tmp/tmpm092liya/dbt_project.yml'
What you think should happen instead?
No response
How to reproduce
- Setup Airflow 3.1.6 with LocalExecutor.
- Configure a GitDagBundle
- Run multiple tasks that reads files from the bundle directory (like a DbtSelectOperator from Cosmos).
Operating System
Debian GNU/Linux 12 (bookworm)
Versions of Apache Airflow Providers
apache-airflow==3.1.6
apache-airflow-core==3.1.6
apache-airflow-providers-airbyte==5.3.1
apache-airflow-providers-amazon==9.19.0
apache-airflow-providers-common-compat==1.11.0
apache-airflow-providers-common-io==1.7.0
apache-airflow-providers-common-sql==1.30.2
apache-airflow-providers-datadog==3.10.1
apache-airflow-providers-http==5.6.2
apache-airflow-providers-microsoft-azure==12.10.1
apache-airflow-providers-mongo==5.3.1
apache-airflow-providers-smtp==2.4.1
apache-airflow-providers-standard==1.10.2
apache-airflow-task-sdk==1.1.6
Deployment
Official Apache Airflow Helm Chart
Deployment details
- name: AIRFLOW__DAG_PROCESSOR__DEFAULT_BUNDLE_NAME
value: "dh-pipeline-dags"
- name: AIRFLOW__DAG_PROCESSOR__DAG_BUNDLE_CONFIG_LIST
value: >
[
{
"name": "dh-pipeline-dags",
"classpath": "airflow.providers.git.bundles.git.GitDagBundle",
"kwargs": {
"tracking_ref": "qa/deployed",
"subdir": "dags",
"git_conn_id": "git_xc_dh_pipeline",
"refresh_interval": 300
}
}
]
Anything else?
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct