-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Fix Certain DAG import errors ("role does not exist") don't persist in Airflow #51511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Certain DAG import errors ("role does not exist") don't persist in Airflow #51511
Conversation
jason810496
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! Could you add the corresponding unit test?
Thanks for the feedback. I am wondering if it’s a proper fix but will add unit tests for it. |
The investigation seems reasonable to me, but still need second eye on it to confirm. |
|
Hi @jason810496 , I am wondering if you know how to enable FabAuthManager in the breeze environment. I found the test |
How about configuring auth manager as FABAuthManager ( airflow/dev/breeze/doc/02_customizing.rst Lines 29 to 33 in e2be640
|
Yeah, I’ve tried that but seems like it’s not picked up, I can double check if there is any typo. |
|
answered on slack :), |
5cdbca5 to
c8d4b27
Compare
jason810496
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for the update.
It would be great if the "full tests needed" and "all versions" tags can be added to this PR as a safeguard.
I just added the tags and trigger the CI again by close and reopen the PR.
c8d4b27 to
7c42b56
Compare
jason810496
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for the fix.
Thank you for the time to review! |
7c42b56 to
613017a
Compare
|
@jason810496 is it ready to merge? |
Hi @eladkal , thanks for looking at it. I think it will be great if we could have a second eyes on it, as Jason mentioned in previous discussion. the caveat for this fix is that it runs the permission sync for ALL DAGs regardless those are updated or not. If we don’t check in this way, the current implementation will miss those permission error when dag processor run again, as discussed here #51511 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just want to make sure whether the description still consist with the latest code after a second read, it seems to be 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are concerned about performance issues, should we conduct some experiments to see how it affects performance? If it doesn't affect much, I think we're good to merge
Yes, I think it’s still consistent. |
Right. Wondering if there is anyway in breeze we can profile DAG processor’s performance. I will look into it and probably create a certain number of DAGs to see how the performance is impacted. |
613017a to
83c821b
Compare
83c821b to
b0ecc2e
Compare
Backport failed to create: v3-0-test. View the failure log Run details
You can attempt to backport this manually by running: cherry_picker cdab6d7 v3-0-testThis should apply the commit to the v3-0-test branch and leave the commit in conflict state marking After you have resolved the conflicts, you can continue the backport process by running: cherry_picker --continue |
|
Interested in this back port issue, probably because the current main branch is targeting 3.1.0 🤔 , please let me know if I could help with anything to resolve it. |
just follow #51511 (comment) and try to resolve the conflict. feel free to tag me if you encounter any issue :) |
Thanks! Will do by end of today or tomorrow. I think I should also rebase first before following the instruction? |
|
…'t persist in Airflow (apache#51511) * update _sync_dag_perms to run for all dags * add and update test cases * precommit fix * refactor mark to variable for reusability (cherry picked from commit cdab6d7) Co-authored-by: Kevin Yang <85313829+sjyangkevin@users.noreply.github.com>
…'t persist in Airflow (#51511) (#54432) ### Motivation As described in #49651, when the access control for a DAG is set to an non-exist role, the DAG import error show up in Airflow UI for a while and then disappear. The update is to fix this issue, and let the import error persist in the metadata DB until the DAG is updated with a correct access control setting. Close #49651 ### What is the issue https://github.com/apache/airflow/blob/56fbe90a8d0b56558c02d75f4ac5852e041cb058/airflow-core/src/airflow/dag_processing/collection.py#L177 When the DAG's access control is set to a non-exist role, the following process will raise an Exception "Failed to write serialized DAG dag_id=...". So, how this exception is triggered? 1. `dag_was_updated` will be `True` when the first time `SerializedDagModel.write_dag` write the serialized DAG to the database. 2. when `dag_was_updated` is `True`, `_sync_dag_perms` will be triggered to sync DAG specific permissions. At the moment, it detects that the role doesn't exist, and raise an error, resulting in the exception. 3. This exception will be captured, and being logged as an import error temporary in the DB, and show up in the UI. From my understanding, this sync process will run for every `MIN_SERIALIZED_DAG_UPDATE_INTERVAL`. So, what happen in the second run. 1. `dag_was_updated` will be `False` since the DAG code is not updated. 2. In this case, `_sync_dag_perms` will **NOT BE TRIGGERED** even though in the access control is set incorrectly in the DAG code. 3. Therefore, no exception will be raised, and no import error will be logged. Therefore, the import error is removed from the DB, as well as from the UI. ### What is the fix In the current state, `_sync_dag_perms` runs only when the DAG is updated (i.e., `dag_was_updated` is `True`). This can be more performant because it doesn't run for all the DAGs. However, it cannot properly handle the sync for permissions. Therefore, the current fix is to make `_sync_dag_perms` run for all the DAGs during the DAG sync process. I understand it might not be an ideal fix, but I wasn't able to find a better solution due to my limited understanding on the code. I would really appreciate if anyone could suggest some ideas to further improve it.







Motivation
As described in #49651, when the access control for a DAG is set to an non-exist role, the DAG import error show up in Airflow UI for a while and then disappear. The update is to fix this issue, and let the import error persist in the metadata DB until the DAG is updated with a correct access control setting.
Close #49651
What is the issue
airflow/airflow-core/src/airflow/dag_processing/collection.py
Line 177 in 56fbe90
When the DAG's access control is set to a non-exist role, the following process will raise an Exception "Failed to write serialized DAG dag_id=...". So, how this exception is triggered?
dag_was_updatedwill beTruewhen the first timeSerializedDagModel.write_dagwrite the serialized DAG to the database.dag_was_updatedisTrue,_sync_dag_permswill be triggered to sync DAG specific permissions. At the moment, it detects that the role doesn't exist, and raise an error, resulting in the exception.From my understanding, this sync process will run for every
MIN_SERIALIZED_DAG_UPDATE_INTERVAL. So, what happen in the second run.dag_was_updatedwill beFalsesince the DAG code is not updated._sync_dag_permswill NOT BE TRIGGERED even though in the access control is set incorrectly in the DAG code.What is the fix
In the current state,
_sync_dag_permsruns only when the DAG is updated (i.e.,dag_was_updatedisTrue). This can be more performant because it doesn't run for all the DAGs. However, it cannot properly handle the sync for permissions. Therefore, the current fix is to make_sync_dag_permsrun for all the DAGs during the DAG sync process. I understand it might not be an ideal fix, but I wasn't able to find a better solution due to my limited understanding on the code. I would really appreciate if anyone could suggest some ideas to further improve it.^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.