Fix max_active_runs lost during DAG serialisation when value equals schema default#65310
Conversation
…chema default The serialisation optimisation from apache#55849 strips DAG fields that match their schema.json default. For max_active_runs, max_active_tasks, and max_consecutive_failed_dag_runs this is wrong because their runtime defaults come from airflow.cfg, not the schema. When a user explicitly sets max_active_runs=16 and the config has max_active_runs_per_dag=1, the value gets stripped and the dag table ends up with 1. Skip the schema-default exclusion for these three config-driven fields so they always survive serialisation.
|
@seruman Converting to draft — this PR doesn't yet meet our Pull Request quality criteria.
See the linked criteria for how to fix each item, then mark the PR "Ready for review". This is not a rejection — just an invitation to bring the PR up to standard. No rush. Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
|
Quick follow-up to the triage comment above — one clarification on the "Unresolved review comments" item: Once you believe a thread has been addressed — whether by pushing a fix, or by replying in-thread with an explanation of why the suggestion doesn't apply — please mark the thread as resolved yourself by clicking the "Resolve conversation" button at the bottom of each thread. Reviewers don't auto-close their own threads, so an addressed-but-unresolved thread reads as "still waiting on the author" and keeps the PR from moving forward. The author doing the resolve-click is the expected convention on this project. Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
|
@potiuk I think triage tool false flags self-reviews, marked it as resolved. I just wanted reviewers comment on it. |
|
Fix is correct and the bug is real. Before merging, worth weighing against extending The gap this PR papers over: #55849 added DAG-level schema-default exclusion without extending the Concrete sketch
DAG_DEFAULTS = {
"max_active_runs": ("core", "max_active_runs_per_dag"),
"max_active_tasks": ("core", "max_active_tasks_per_dag"),
"max_consecutive_failed_dag_runs": ("core", "max_consecutive_failed_dag_runs_per_dag"),
"catchup": ("scheduler", "catchup_by_default"),
"disable_bundle_versioning": ("dag_processor", "disable_bundle_versioning"),
}
json_dict["client_defaults"] = {
"tasks": OperatorSerialization.generate_client_defaults(),
"dag": DagSerialization.generate_client_defaults(),
}
Verified locally, every scenario round-trips correctly (cfg ∈ {1, 16} × user_set ∈ {None, 1, 16, 42}):
Why this beats the carve-out
Tradeoff: ~30-40 line diff vs the current 13. Larger surface, but lands on the architecture already in place, and the one-time |
|
Walking back my earlier suggestion. The Simplest fix: keep the schema.json changes (drop The PR's original carve-out approach was actually closer to right than this rewrite. Apologies for the detour 🤦🏻♂️. |
|
@kaxil yeah that makes much more sense. I do not think 50B/DAG would be much of an issue -at least in my case- but the abstraction was feeling heavy. Had to add an exclusion list to Python side defaults would diverge from the schema side. No need for an apology, TIL how Airflow serializes DAGs 🎉 |
|
@seruman — Your unresolved review thread(s) from @ephraimbuddy, @kaxil appear to have been addressed (post-review commits and/or in-thread replies on every thread, with the latest commit pushed after the most recent thread). I've added the @ephraimbuddy, @kaxil — could you take another look when you have a chance? If you agree the feedback was addressed, please mark the threads as resolved so the queue signal stays accurate. If a thread still needs work, please reply in-line — @seruman will follow up. Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
ephraimbuddy
left a comment
There was a problem hiding this comment.
A few suggestions on the tests — nothing blocking. Two are about widening coverage so the fix doesn't regress for the sibling fields; the rest are tightening / parametrisation / docstring tweaks.
One extra point that I couldn't leave inline because the line sits outside the diff hunk:
improvement: test_dag_schema_defaults_optimization (around the for field in DagSerialization.get_schema_defaults("dag").keys() loop, ~L3831) is now weaker than its surrounding comments suggest. After this PR, that loop iterates only over the fields that still have a schema default (fail_fast, render_template_as_native_obj, callback flags). It no longer asserts anything about catchup, max_active_runs, max_active_tasks, max_consecutive_failed_dag_runs, or disable_bundle_versioning — yet the DAG above is still constructed with catchup=False, max_active_runs=16, max_active_tasks=16, max_consecutive_failed_dag_runs=0, disable_bundle_versioning=False under a comment that reads "These should match schema defaults and be excluded". That comment is now wrong, and assert deserialized_dag.max_active_runs == 16 succeeds for the opposite reason it used to (the value is on the wire now, not restored from a schema default). Minimal fix: update the comment, and add a positive for field in (...): assert field in dag_data block to lock in the new contract right next to the test that locks in the old one.
Drafted-by: Claude Code (Opus 4.7); reviewed by @ephraimbuddy before posting
|
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
Backport successfully created: v3-2-testNote: As of Merging PRs targeted for Airflow 3.X In matter of doubt please ask in #release-management Slack channel.
|
…lue equals schema default (#65310) (#67097) * Fix max_active_runs lost during DAG serialisation when value equals schema default The serialisation optimisation from #55849 strips DAG fields that match their schema.json default. For max_active_runs, max_active_tasks, and max_consecutive_failed_dag_runs this is wrong because their runtime defaults come from airflow.cfg, not the schema. When a user explicitly sets max_active_runs=16 and the config has max_active_runs_per_dag=1, the value gets stripped and the dag table ends up with 1. Skip the schema-default exclusion for these three config-driven fields so they always survive serialisation. * fix: introduce DAG_DEFAULTS * chore: add test for catchup * fix: revert comment change, that wording was better * fix: preserve DAG fields whose defaults match the schema default * fix: update schema default check script * fix: address review comments on serialisation tests * fix: add disable_bundle_versioning case to parametrised serialisation test * fix: improve docstring for config-driven fields serialisation test * fix: use tuple for pytest.mark.parametrize first argument (cherry picked from commit 76eb2a0) Co-authored-by: Selman <seruman@users.noreply.github.com> Co-authored-by: Rahul Vats <43964496+vatsrahul1001@users.noreply.github.com>
…lue equals schema default (#65310) (#67097) * Fix max_active_runs lost during DAG serialisation when value equals schema default The serialisation optimisation from #55849 strips DAG fields that match their schema.json default. For max_active_runs, max_active_tasks, and max_consecutive_failed_dag_runs this is wrong because their runtime defaults come from airflow.cfg, not the schema. When a user explicitly sets max_active_runs=16 and the config has max_active_runs_per_dag=1, the value gets stripped and the dag table ends up with 1. Skip the schema-default exclusion for these three config-driven fields so they always survive serialisation. * fix: introduce DAG_DEFAULTS * chore: add test for catchup * fix: revert comment change, that wording was better * fix: preserve DAG fields whose defaults match the schema default * fix: update schema default check script * fix: address review comments on serialisation tests * fix: add disable_bundle_versioning case to parametrised serialisation test * fix: improve docstring for config-driven fields serialisation test * fix: use tuple for pytest.mark.parametrize first argument (cherry picked from commit 76eb2a0) Co-authored-by: Selman <seruman@users.noreply.github.com> Co-authored-by: Rahul Vats <43964496+vatsrahul1001@users.noreply.github.com>
…lue equals schema default (#65310) (#67097) * Fix max_active_runs lost during DAG serialisation when value equals schema default The serialisation optimisation from #55849 strips DAG fields that match their schema.json default. For max_active_runs, max_active_tasks, and max_consecutive_failed_dag_runs this is wrong because their runtime defaults come from airflow.cfg, not the schema. When a user explicitly sets max_active_runs=16 and the config has max_active_runs_per_dag=1, the value gets stripped and the dag table ends up with 1. Skip the schema-default exclusion for these three config-driven fields so they always survive serialisation. * fix: introduce DAG_DEFAULTS * chore: add test for catchup * fix: revert comment change, that wording was better * fix: preserve DAG fields whose defaults match the schema default * fix: update schema default check script * fix: address review comments on serialisation tests * fix: add disable_bundle_versioning case to parametrised serialisation test * fix: improve docstring for config-driven fields serialisation test * fix: use tuple for pytest.mark.parametrize first argument (cherry picked from commit 76eb2a0) Co-authored-by: Selman <seruman@users.noreply.github.com> Co-authored-by: Rahul Vats <43964496+vatsrahul1001@users.noreply.github.com>
…lue equals schema default (#65310) (#67097) * Fix max_active_runs lost during DAG serialisation when value equals schema default The serialisation optimisation from #55849 strips DAG fields that match their schema.json default. For max_active_runs, max_active_tasks, and max_consecutive_failed_dag_runs this is wrong because their runtime defaults come from airflow.cfg, not the schema. When a user explicitly sets max_active_runs=16 and the config has max_active_runs_per_dag=1, the value gets stripped and the dag table ends up with 1. Skip the schema-default exclusion for these three config-driven fields so they always survive serialisation. * fix: introduce DAG_DEFAULTS * chore: add test for catchup * fix: revert comment change, that wording was better * fix: preserve DAG fields whose defaults match the schema default * fix: update schema default check script * fix: address review comments on serialisation tests * fix: add disable_bundle_versioning case to parametrised serialisation test * fix: improve docstring for config-driven fields serialisation test * fix: use tuple for pytest.mark.parametrize first argument (cherry picked from commit 76eb2a0) Co-authored-by: Selman <seruman@users.noreply.github.com> Co-authored-by: Rahul Vats <43964496+vatsrahul1001@users.noreply.github.com>
When a DAG explicitly sets
max_active_runs=16andairflow.cfghasmax_active_runs_per_dag = 1, thedagtable ends up with1. Setting it to17or any other value that isn't16works fine.The serialisation optimisation from #55849 strips DAG fields that match their
schema.jsondefault.This works for static defaults like
catchup=False, butmax_active_runs,max_active_tasks, andmax_consecutive_failed_dag_runsget their defaults fromairflow.cfgat parse time, not from the schema.When the user's explicit value happens to equal the schema default (16), it gets stripped,
LazyDeserializedDAGreturnsNone, andcollection.pyfalls back to whateverairflow.cfgsays.The fix skips the schema-default exclusion for these three config-driven fields so they always survive serialisation.
After deploying this, the first parse cycle will produce a slightly different serialised payload for every DAG (three extra int fields), which means a one-time
dag_hashchange and a newDagVersionfor DAGs that have running task instances.closes: #65307
related: #57604
related: #56646
Was generative AI tooling used to co-author this PR?
Generated-by: pi (Claude Opus 4.6) following the guidelines
Note
Prompted it like when DAGs and config configured like this I observe this and rows in metadata is like this and informed it with my suspicion on hard coded default 16 in the schema to point me to relevant paths. After the proposed fix and unit tests went over the code an tests to make sure it is correct and aligns with the rest of the project and see if there're any alternative solutions. Spawned Airflow with
breeze, tested the same exact scenarios we had in the real deployment to verify the new behaviour along with thedag_haschange I mentioned above.{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.