Skip to content

Fix missing dag_id in get_task_instance (#64957)#64968

Merged
jscheffl merged 1 commit into
apache:mainfrom
holmuk:bugfix/ti-model-missing-dag-id
Apr 11, 2026
Merged

Fix missing dag_id in get_task_instance (#64957)#64968
jscheffl merged 1 commit into
apache:mainfrom
holmuk:bugfix/ti-model-missing-dag-id

Conversation

@holmuk
Copy link
Copy Markdown
Contributor

@holmuk holmuk commented Apr 9, 2026

Closes #64957
Closes #65044

TaskInstance uniqueness is defined by the composite key (dag_id, task_id, run_id, map_index) (see the SQLAlchemy model):

UniqueConstraint("dag_id", "task_id", "run_id", "map_index", name="task_instance_composite_key"),
ForeignKeyConstraint(

The problem is that the filter in get_task_instance does not include dag_id, violating this invariant. This PR fixes the issue and adds a regression test (test_get_task_instance_disambiguates_by_dag_id_and_run_id) that checks that TIs with different dag_id or run_id can be fetched without exceptions.


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@holmuk holmuk requested review from XD-DENG and ashb as code owners April 9, 2026 16:38
@boring-cyborg
Copy link
Copy Markdown

boring-cyborg Bot commented Apr 9, 2026

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@Dev-iL Dev-iL requested review from eladkal, potiuk and vincbeck April 9, 2026 18:57
Copy link
Copy Markdown
Collaborator

@Dev-iL Dev-iL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: Verified. All 4 existing callers of TaskInstance.get_task_instance() already pass dag_id — the method signature is unchanged. The only other similar query on the model (refresh_from_db) already included dag_id, so this was the sole gap.

Test: The tasks_with_different_dags case creates the exact collision scenario and would fail with MultipleResultsFound on pre-fix code.

One suggestion for making the regression test more self-documenting (non-blocking — see inline comment).

(Review comments generated with the help of Claude)

Comment thread airflow-core/tests/unit/models/test_taskinstance.py
Copy link
Copy Markdown
Contributor

@Vamsi-klu Vamsi-klu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is correct. I traced through the code to confirm the bug is realistic and not just theoretical. The run_id for scheduled runs is generated as "scheduled__" (dagrun.py:822), and the dag_run table's unique constraint is on (dag_id, run_id), not run_id alone. So two different DAGs on the same schedule will produce identical run_id values. If they also share a task name (which is common with standardized patterns like "extract" or "load"), the old query without dag_id would match both rows and scalar_one_or_none raises MultipleResultsFound.

All four callers of this classmethod (edge executor, edge logs route, and two test utilities) pass dag_id and expect it to filter correctly, so this was a real production path.

On the test, the "tasks_with_different_dags" parametrized case is the one that actually exercises the fix. The "tasks_with_different_runs" case tests rows that already differ by run_id, which was in the filter before the fix, so that case would have passed even without the change. Might be worth adding a comment in the test noting that the second case is just a sanity check rather than a regression test for this specific bug, so future readers understand the intent.

The test name test_instance_dag_and_run_id_uniqueness reads like it is testing a database constraint. Something like test_get_task_instance_filters_by_dag_id would make it clearer that the method under test is get_task_instance and the thing being verified is that dag_id is actually used in the query.

Building on what Dev-iL suggested, adding assert found_1.dag_id == dag_id_1 alongside the id comparison would make the assertion more self-documenting. The id check proves the right row came back, but the dag_id check makes it immediately obvious what property is being verified.

@kaxil kaxil requested a review from Copilot April 10, 2026 19:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes TaskInstance.get_task_instance() returning ambiguous results by ensuring the lookup filter includes dag_id, and adds a regression test to cover the composite-uniqueness behavior described in #64957.

Changes:

  • Add dag_id to the SQLAlchemy .filter_by(...) criteria in TaskInstance.get_task_instance().
  • Add a parametrized unit test that creates two TaskInstances differing by dag_id or run_id and verifies they can be fetched uniquely.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
airflow-core/src/airflow/models/taskinstance.py Ensures get_task_instance() filters by dag_id to match the TI composite identity.
airflow-core/tests/unit/models/test_taskinstance.py Adds a regression test validating get_task_instance() correctly disambiguates TaskInstances.

Comment thread airflow-core/tests/unit/models/test_taskinstance.py Outdated
Comment thread airflow-core/tests/unit/models/test_taskinstance.py
@holmuk holmuk force-pushed the bugfix/ti-model-missing-dag-id branch from 4f8ea47 to 1ed5afe Compare April 11, 2026 11:25
Copy link
Copy Markdown
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is also a performance impact because w/o dag_id it can not use the index efficiently! LGTM!

@jscheffl jscheffl added the backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch label Apr 11, 2026
@potiuk potiuk added this to the Airflow 3.2.1 milestone Apr 11, 2026
@jscheffl jscheffl merged commit 4ecbd59 into apache:main Apr 11, 2026
80 checks passed
@boring-cyborg
Copy link
Copy Markdown

boring-cyborg Bot commented Apr 11, 2026

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

github-actions Bot pushed a commit that referenced this pull request Apr 11, 2026
(cherry picked from commit 4ecbd59)

Co-authored-by: holmuk <20281580+holmuk@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

Backport successfully created: v3-2-test

Note: As of Merging PRs targeted for Airflow 3.X
the committer who merges the PR is responsible for backporting the PRs that are bug fixes (generally speaking) to the maintenance branches.

In matter of doubt please ask in #release-management Slack channel.

Status Branch Result
v3-2-test PR Link

jscheffl pushed a commit that referenced this pull request Apr 11, 2026
…#65067)

(cherry picked from commit 4ecbd59)

Co-authored-by: holmuk <20281580+holmuk@users.noreply.github.com>
vatsrahul1001 pushed a commit that referenced this pull request Apr 15, 2026
…#65067)

(cherry picked from commit 4ecbd59)

Co-authored-by: holmuk <20281580+holmuk@users.noreply.github.com>
vatsrahul1001 pushed a commit that referenced this pull request Apr 15, 2026
…#65067)

(cherry picked from commit 4ecbd59)

Co-authored-by: holmuk <20281580+holmuk@users.noreply.github.com>
vatsrahul1001 pushed a commit that referenced this pull request Apr 15, 2026
…#65067)

(cherry picked from commit 4ecbd59)

Co-authored-by: holmuk <20281580+holmuk@users.noreply.github.com>
vatsrahul1001 pushed a commit that referenced this pull request Apr 15, 2026
…#65067)

(cherry picked from commit 4ecbd59)

Co-authored-by: holmuk <20281580+holmuk@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch

Projects

None yet

6 participants