Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support to delete datasets used in Data Aware Scheduling #36308

Open
1 of 2 tasks
sai3563 opened this issue Dec 19, 2023 · 3 comments
Open
1 of 2 tasks

Support to delete datasets used in Data Aware Scheduling #36308

sai3563 opened this issue Dec 19, 2023 · 3 comments
Assignees
Labels
area:API Airflow's REST/HTTP API area:datasets Issues related to the datasets feature area:UI Related to UI/UX. For Frontend Developers. kind:feature Feature Requests

Comments

@sai3563
Copy link

sai3563 commented Dec 19, 2023

Description

Hi All,

I am extensively using Data Aware Scheduling in my projects. One thing I've noticed is that in the UI or via code, there is no button/function to delete datasets. It would be great if we can add a function to do the same and also a button in UI to delete it.

Use case/motivation

In my case, I am generating dags dynamically which also creates datasets. The configs, stored in MongoDB, creates the dags. Often, we add and remove such configs. On removal, their dags disappear but datasets remain. Deletion option of dataset will help maintain cleanliness in the Datasets page for me and I'm sure many others in the future.

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@sai3563 sai3563 added kind:feature Feature Requests needs-triage label for new issues that we didn't triage yet labels Dec 19, 2023
@RNHTTR RNHTTR added area:UI Related to UI/UX. For Frontend Developers. area:datasets Issues related to the datasets feature and removed needs-triage label for new issues that we didn't triage yet labels Dec 19, 2023
@bbovenzi bbovenzi added the area:API Airflow's REST/HTTP API label Dec 19, 2023
@im-perativa
Copy link
Contributor

@bbovenzi can I work on this?

@im-perativa
Copy link
Contributor

Often, we add and remove such configs. On removal, their dags disappear but datasets remain. Deletion option of dataset will help maintain cleanliness in the Datasets page for me and I'm sure many others in the future.

@sai3563 I would like to clarify, did you mean the datasets remain on both UI and database or database only? Because from what I can see orphaned dataset are not shown in the UI.

Unless force deletion of unorphaned dataset is possible and meaningful (it will recreated in next DAG code scan anyway cmiiw) I'm not sure if we need to add delete button in UI. I do agree with delete function using API though and I'm working on it.

@sai3563
Copy link
Author

sai3563 commented Jan 12, 2024

@im-perativa I just checked again and found that orphaned datasets continue to show up in the UI.

I'm on Airflow 2.7.3. Not sure if anything has changed after that in regards to datasets, but here is how this works for me.

I use configs in MongoDB to make dynamic dags. Let's say 200 documents in MongoDB = 200 dags in Airflow. Based on the name of the dag, datasets are also created.

Now I add 1 document, so now I have 201 dags and I run the newly added dag. Dataset of the new dag gets updated. Now if I remove that config from Mongo, dag in airflow of that document also gets deleted. But dataset of it remains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:API Airflow's REST/HTTP API area:datasets Issues related to the datasets feature area:UI Related to UI/UX. For Frontend Developers. kind:feature Feature Requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants