Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset DEL/POST API should also check the access_control at DAG level if defined #42846

Open
1 of 2 tasks
nicolasge opened this issue Oct 9, 2024 · 0 comments
Open
1 of 2 tasks
Labels
area:API Airflow's REST/HTTP API area:core area:datasets Issues related to the datasets feature kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet

Comments

@nicolasge
Copy link

Apache Airflow version

2.10.2

If "Other Airflow 2 version" selected, which one?

No response

What happened?

Let's say we have a DAG called DAG_A with several tasks and one of the task will trigger a dataset update.

  • In the same DAG, we have added an additional access_control to achieve DAG level access control and defined only users belong to Role_A can create DAG run on this DAG.
  • the downstream DAGs also have DAG level access control defined, let's call it Role_B and Role_C

Right now, one user with a role which "can create on Datasets" will have the permission to trigger an event for this dataset, even this user doesn't have any role with dag run permissions to the DAG_A or DAG_A's downstream DAGs

What you think should happen instead?

To support DAG level access control, in order to trigger a dataset update event, besides the "can create on Datasets" permission, the user should also:

  • As the upstream or datasets generator, user should have permission to create dag_run on the DAGs which actually generate the dataset event if no human intervention. Because upstream DAG owners can always re-run their DAG to create a new event
  • As the downstream of the dataset, user will need to have permission to create dag_run on ALL the downtreams DAGs.

So in this case, in order to call the API to create a dataset event, beside a role with permission to "can create on Datasets", this user need to be in Role_A( if he/she is the upstream owner), or both Role_B and Role_C (If he/she is the downstream owner )

How to reproduce

create 3 users with 3 roles:

  • User C with role_C: "can create on Datasets"
  • User A with role_A: can create dag runs on "DAG_A"
  • User B with role_B: can create dag runs on "DAG_B"

Create 2 DAGs with dag level access control defined in DAG:

  • DAG_A: only Role_A can create dag runs, and define a dataset outlet
  • DAG_B: only Role_B can create dag runs, and schedule based on dataset defined in DAG_A

Then use user C to call the Airflow API

Operating System

Debian 12

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@nicolasge nicolasge added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Oct 9, 2024
@dosubot dosubot bot added area:API Airflow's REST/HTTP API area:datasets Issues related to the datasets feature labels Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:API Airflow's REST/HTTP API area:core area:datasets Issues related to the datasets feature kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet
Projects
None yet
Development

No branches or pull requests

1 participant