-
Notifications
You must be signed in to change notification settings - Fork 14.2k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Endpoints for Dataset Creation and Updating #36686
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. |
I also believe this would be helpful and aligns with the data-aware scheduling feature, as well as the dataset listener feature. |
I think it is far too big of a feature and should be discussed at the devlist. Currently all the objects (DAGs and Datasaets alike) are created by parsing DAG files, NOT by creating DB entities. IMHO it makes very little sense to start creating those datasets via APIs. Especially that Datasets are not "standalone" entities and that there is nothing that could happen if you create a dataset via API but there is no DAG file that uses it. I am not sure what would be the consequences of it as I know there are other discussions happening about dataset future - but I think if you want to start anything about that, starting a devlist discussion and explaining what you want is really the right way of approaching it. Converting it into discussion as this is definitely not a "feature" scope. |
Added a new POST endpoint to the Airflow API for creating datasets. This feature includes the necessary OpenAPI specifications, TypeScript type definitions, and unit tests. It enables users to programmatically create datasets, enhancing the integration and automation capabilities of Airflow. The endpoint handles standard responses for success, unauthorized access, permission denial, and not found errors. This is one of two PRs related to the following discussion: apache#36723 Resolves: apache#36686 (partially)
Added a new POST endpoint to the Airflow API for creating datasets. This feature includes the necessary OpenAPI specifications, TypeScript type definitions, and unit tests. It enables users to programmatically create datasets, enhancing the integration and automation capabilities of Airflow. The endpoint handles standard responses for success, unauthorized access, permission denial, and not found errors. This is one of two PRs related to the following discussion: apache#36723 Resolves: apache#36686 (partially)
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Description
I would like to propose the addition of new API endpoints for creating and updating datasets in Airflow. This feature would be a valuable extension to the current dataset capabilities and would align with the direction Airflow is heading, especially considering the dataset listeners introduced in Airflow 2.8.
Proposed Changes:
Use case/motivation
In a multi-instance Airflow architecture, managing dataset dependencies across instances can be challenging, as we are currently experiencing in our organization.
This feature also aligns with the recent advancements in Airflow 2.8, particularly with the introduction of dataset listeners. These developments have opened the door for improved cross-instance dataset awareness, an area where this proposal would be extremely beneficial.
We believe that with the introduction of these new endpoints, Airflow would offer a more efficient and facilitated approach to cross-instance dataset-aware scheduling. This enhancement would not only benefit our organization but also the broader Airflow community, as it is likely a common challenge faced by many and more will likely encounter in the future.
Related issues
This feature complements the discussions and contributions already seen in the community, especially those related to enhancing dataset management and integration in Airflow.
There have been some ongoing discussions and contributions on GitHub, e.g. #36308 #29162, including a previously closed Pull Request (#29433).
These discussions highlight the community's interest in and need for enhanced dataset management capabilities.
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: