-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Add configs to set bundle file & folder permissions #60270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c940c5a to
f9e7869
Compare
potiuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I would love though others (@ephraimbuddy @amoghrajesh ) to take a look
f9e7869 to
1f7477d
Compare
|
I think the problem is basically that:
Why can we not consider initing the bundle after we know "who" will run the task? Maybe an early pass to detect "run_as_user" and then actually init the bundle with that user? |
|
Since run_as_user is not always global(from config) and can be per dag, will it not be an issue with this approach? |
I think this is a suitable suggestion if you know in advance who the impersonated user will be and it will never change (incompatibly). Otherwise, I'd say that to support changing the impersonation the final git location should be cleaned by the airflow user and let it be recreated by the impersonation. (BTW, this is very similar to one of the approaches suggested by the AI) |
|
Yeah. @ephraimbuddy is very right about it.
I think that was mostly a comment to @amoghrajesh 's suggestion.
The thing is that we can have celery or edge workers wher you have multiple tasks run on the same machine and sharing (supervisor) processes that are long running. And in this case, you can have the same
I thought initially that the "recommended" solution should be the one I suggested in slack initially as a workaround - i.e. umask with writable access for group + all users used for impersonation (and airlfow user) have to share the same group. That might be of course relaxed if you you have a filesystem that allows to implement those permissions in a different way - like NFS - but generallly our recommendation should be "make sure airfow and all the users used for impersonation can have write access to the bare git repo - this can be done for example by ... <umask + group>. However after a bit of thinking - this one leaks a bit from airlfow into the deployment and likely makes things more complex. I think, however, there is another possibility - and maybe we could consider it - instead of running That however I think will require a bit more coupling between the bundle interface and execution api - because this is not exclusively a problem of GitBundle only - other bundles might have very similar problems and we should solve it in a generic way so that there is a generic task -> supervisor ("get bundle version NNN") has to be added to the execution But I see that as the best long-term option. |
How does this sound for now:
### Using DAG Bundles with User Impersonation
When using `run_as_user` with DAG bundles, ensure:
1. All impersonated users and the Airflow user are in the same group
2. Configure appropriate umask settings
3. Set `dag_bundle_new_folder_permissions = 0o775` (default)
4. Set `dag_bundle_new_file_permissions = 0o664` (default)
**Note:** This is a temporary solution. Future versions will handle
multi-user access through supervisor-based bundle operations.
# Fallback needed for backward compatibility with old config files |
1f7477d to
0df3557
Compare
0df3557 to
1345690
Compare
1345690 to
0cb45e2
Compare
|
I'm thinking about changing the default permissions to 755/600 so as not to silently enforce weaker access controls by simply updating airflow. However, I don't know how this might affect various environments. Comments will be very welcome! |
|
Question., If we really think it's a temporary solution only, I think it will add quite some pollution to config space - we will have two new config values that will become obsolete with the new solution - and they do not solve the concurrency overhead problem.. Why not simply make into a documentation for GitBundle impersonation: a) set group-writeable umask (002) That should (?) have the same effect as adding our code to manage the permissions, it would still require deployment level modifications, but at least it would not add new config params that will become useless in next version. Or maybe I missed something ? |
|
Closing because we rather have a permanent solution, and a workaround requiring no code changes was proposed. |
- This PR was created with the help of AIContext
When using DAG bundles (particularly Git bundles) with user impersonation (
run_as_user), tasks can fail due to permission conflicts. The root cause is that bundles are initialized by the main airflow user BEFORE impersonation happens, but the impersonated user later needs access to bundle resources like lock files and tracking directories.Specific issues encountered:
Changes
This PR provides the foundation for fixing impersonation + bundle permission issues:
[logging] file_task_handler_new_file_permissionswhich solves the same problem for log filesRelated Changes (follow-up PRs)
This is part 1 of 3 PRs to fully address the impersonation + bundles issue:
Architecture Note
This PR implements a permission-based solution where the bundle repository
is made group-writable so that multiple users (Airflow user + impersonated
users) can all perform git operations.
Long-term improvement: As discussed in comments, a better solution would
be to add bundle fetching to the Execution API, so only the supervisor needs
write access. This PR provides a working solution in the meantime. See:
#60270 (comment)
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.