Replies: 2 comments 1 reply
-
|
I converted it into dicussion - because it feels more like discussion than issue. I think So we have chicken-egg problem:
I believe the current cluster policy page attempts to describe the general case of where each policy is being executed, but maybe some special cases like that one could be better described? In order to do that, someone should track all the steps and execution paths where and how each modification is made, whether it can/is saved in the DB, whether changing it has any effect (like with the queue case - changing it after it has been already used to determine where to run the code to change it, has no effect). That's likely an interesting task for someone to do. And having somoene to do it would be great. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the feedback. Chicken-egg problem also understood. And I also understand that the DB is primarily used to persist the data and be able to share it across the nodes. And it is almost working, except in some cases the queue information is over-ridden it seems. I'd raise my hand as I raised the ticket initially. And I made this under the assumption there might be some "guidance" needed. I expected some kind of concept but from the feedback I assume it was just implemented as it is today to make it working in general.. I'll try to trace down the usage of the queue field in the code where it is being used and (potentially by accident) modified later. I propose to file a summary and then we can leave the discussion and I can take this forward to raise a PR. But might take some time because I assume it is rather medium to low priority. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Apache Airflow version
main (development)
What happened
I was noting the
task_instance_mutation_hookfeature and followed the example in https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/cluster-policies.html#task-instance-mutation to see how this works to "dynamically adjust the queue" of a task instance.Reason originally was that I noticed that the queue attribute does not support being added to
template_fields, but it seems usingtask_instance_mutation_hookwe can achieve a bit of flexibility to queuing. So far this "works" but I saw a few inconsistencies:task_instance_mutation_hookthe UI in DAGs->Grid->Task Details->More Details always shows the task definition value ofqueueand not the mutated value, which actually is stored in DBtask_instanceDB table.What you think should happen instead
When I use the
task_instance_mutation_hookand actually mutated task is in DB, then the UI should display the correct state as well as when navigating through the UI data should not be changed.How to reproduce
task_instance_mutation_hooklike described in https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/cluster-policies.html#task-instance-mutation.SELECT task_id, queue FROM task_instance WHERE dag_id='example_bash_operator';- see the queue is mutatedSELECT task_id, queue FROM task_instance WHERE dag_id='example_bash_operator';- see the queue value is reset to task standard definition.Operating System
Ubuntu 20.04 / Breeze Dev setup in Py 3.8 Container
Versions of Apache Airflow Providers
No specific setup, execution in Breeze from latest main
Deployment
Other
Deployment details
Development setup via
breeze start-airflow...Python 3.8
Postgres
Celery worker
Anything else
I tried to debug down to understand the root cause of the inconsistentcy but was not sure if this is "by design", a "flaw" or just "a bug". Before opening a PR I need to understand how it is desired.
I nailed down the root to be airflow/models/taskinstance.py:refresh_from_task() which is called when loading.
Open question from my side:
pool_override- would it be consistent to make queue also officially override-able?task_instancetable in DB:pool_slotsandpriority_weight- do we need to apply the same fix for these as well?Are you willing to submit PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions