Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-81 Implement POST/Insert Variables in FastAPI #43897

Open
1 of 2 tasks
bugraoz93 opened this issue Nov 11, 2024 · 10 comments
Open
1 of 2 tasks

AIP-81 Implement POST/Insert Variables in FastAPI #43897

bugraoz93 opened this issue Nov 11, 2024 · 10 comments
Assignees
Labels
AIP-81 Enhanced Security in CLI via Integration of API area:API Airflow's REST/HTTP API kind:feature Feature Requests
Milestone

Comments

@bugraoz93
Copy link
Collaborator

Description

Develop an endpoint in FastAPI that enables POST one or multiple Variables from a list. This feature should allow users to import their connections from a set of connections in a single request.

Use case/motivation

AIP-81

Related issues

#42560

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@bugraoz93 bugraoz93 added kind:feature Feature Requests needs-triage label for new issues that we didn't triage yet area:API Airflow's REST/HTTP API AIP-81 Enhanced Security in CLI via Integration of API and removed needs-triage label for new issues that we didn't triage yet labels Nov 11, 2024
@bugraoz93 bugraoz93 changed the title AIP-81 Implement POST/Insert Multiple Pools in FastAPI AIP-81 Implement POST/Insert Variables Pools in FastAPI Nov 11, 2024
@bugraoz93 bugraoz93 changed the title AIP-81 Implement POST/Insert Variables Pools in FastAPI AIP-81 Implement POST/Insert Variables in FastAPI Nov 12, 2024
@jason810496
Copy link
Contributor

@bugraoz93 The issue I will take, Thanks !

@jason810496
Copy link
Contributor

While working on this issue, I came up with two approaches and would appreciate some feedback to decide which one to pursue 🙏

Context for Variable.set

For the post_variable implementation, the Variable.set method is used to add new variables.
Ref: variables.py#L155

The Variable.set method involves four steps:

  1. Check if the secret exists in the custom secrets backend
    For example, if using Vault, AWS, GCP Secret Manager, etc., it will call SecretsBackend.get_variable to verify whether the variable is set in the external secret manager.
  2. Delete the old variable in the metastore
  3. Add the variable with the new value
  4. Set the secret in the cache

Approaches

1. Add a get_variables method to the Variables model and SecretsBackend

This approach would require implementing a get_variables method for both core and external secret managers:

  • Core secret managers:
    • Cache
    • Environment variables
    • Local filesystem
    • Metastore
  • External secret managers:
    • AWS
    • GCP
    • Vault
    • Azure
    • Yandex

2. Call Variable.set concurrently using asyncio

Instead of creating a get_variables method, this approach would involve calling Variable.set concurrently, and have to make sure there are no thread-safety issues.

Pros and Cons

First Approach

  • Pros:
    • Fewer API requests to external services.
  • Cons:
    • More work is required to implement the get_variables method across all secret backends.

Second Approach

  • Pros:
    • Easier and faster to implement.
  • Cons:
    • May result in more API requests to external services, which could impact performance.
    • For example, setting up a bunch of variables in first time might call hundreds of API to external providers simultaneously.

My opinion

We can start with the second approach to quickly implement the main requirement (inserting multiple variables in FastAPI).
Afterward, we can gradually refactor to use the first approach for better efficiency.

cc @bugraoz93 @pierrejeambrun

@pierrejeambrun
Copy link
Member

I would say second approach is more straightforward to implement with an asyncio.gather on all the variables. The issue is for that to work we need the underlying methods, (called methods) to also be async and release the loop on blocking I/O.

Which is not the case at the moment. We would need to async await the calls to the secret backend and also to the DB. So I would say that this is too much work and a deeper change.

That leaves us with choice 1 only I believe.

Or backup choice: no bulk endpoint, asyncio on the client side (CLI) to generate concurrent requests on the API to create all the different Variables at the same time using only 1 thread on the client. But I think that was already considered and discarded.

@bugraoz93
Copy link
Collaborator Author

Great findings! Thanks! My main concern was the Variable :)

Which is not the case at the moment. We would need to async await the calls to the secret backend and also to the DB. So I would say that this is too much work and a deeper change.

I agree this would be too much effort. I would rather not touch those code pieces at all.

Or backup choice: no bulk endpoint, asyncio on the client side (CLI) to generate concurrent requests on the API to create all the different Variables at the same time using only 1 thread on the client. But I think that was already considered and discarded.

My idea was to have a single/common approach for all the file inserts. Making these changes requires more effort than just handling this case on the CLI. Let's simplify this and aim to provide the features for v3 rather than putting more effort which can have multiple side effects too. We may need to spend operational (bugfix/maintenance) time on top of it. We can include this as an improvement point to have a common approach for all file inserts as a next step maybe in v3.1 and we would have more time to implement at ease. Let's take a step back and reject this ticket. :)

What do you think?

@pierrejeambrun
Copy link
Member

pierrejeambrun commented Nov 22, 2024

I agree that consistency would be great. If we feel like this is too much work for now and that our bandwidth needs to be focused on other priorities we can mark it for 3.1 👍.

How will you do the bulk insert for variables while we don't have this implemented ? async loop on the client / CLI as suggested above or you have something else in mind ?

@pierrejeambrun pierrejeambrun added this to the Arflow 3.1+ milestone Nov 22, 2024
@jason810496
Copy link
Contributor

If we implement this on the FastAPI side, I would add a wrapper async def set_async around the _set method to allow the use of asyncio.gather. Additionally, I have verified that the SQLAlchemy session is thread-safe for this case, as the same session object will be used across all coroutines. Ref: sqlalchemy/sqlalchemy#5828

If implemented on the CLI side, I agree with @pierrejeambrun approach.
Both works, depends on finial decision.

@pierrejeambrun
Copy link
Member

If we implement this on the FastAPI side, I would add a wrapper async def set_async around the _set method to allow the use of asyncio.gather. Additionally, I have verified that the SQLAlchemy session is thread-safe for this case, as the same session object will be used across all coroutines. Ref: sqlalchemy/sqlalchemy#5828

The Variable.set method is synced, so basically it will never release and block the main loop. The way you can run sync code from async route is with run_in_executor from asyncio or run_in_threadpool from starlette. https://sentry.io/answers/fastapi-difference-between-run-in-executor-and-run-in-threadpool/

Additionally, I have verified that the SQLAlchemy session is thread-safe for this case, as the same session object will be used across all coroutines. Ref: sqlalchemy/sqlalchemy#5828

I don' think so. As mentioned above you will have to use multiple threads otherwise that would just basically be the same a sequential requests (because the main loop is blocked) => might run into problem with one session shared. And there is a note about AsyncSession which are not threadsafe` in the issue you linked.

Worth trying both, I have no preference at this point. (Maybe inside FastAPI is cleaner because from the CLI point of view, everything is handled the same -> through 1 call to the API, but not sure if this will be super easy to do because of the things highlighted above)

@bugraoz93
Copy link
Collaborator Author

I agree that consistency would be great. If we feel like this is too much work for now and that our bandwidth needs to be focused on other priorities we can mark it for 3.1 👍.

Yeah, this is even better than delete/create. I think handling this case separately doesn't seem too much if we compare it with the implementation and test needed for these changes. Thanks for marking this for 3.1! :)

How will you do the bulk insert for variables while we don't have this implemented ? async loop on the client / CLI as suggested above or you have something else in mind ?

I was thinking the same. Async loop on the client / CLI is the right approach until 3.1. The CLI can handle requests to a single endpoint. Since Variables that are posted will always be recreated, allowing users to publish the same file again with Variables won’t cause issues, aside from increasing the number of API calls. This approach would provide a consistent experience for users across other file imports.

@jason810496
Copy link
Contributor

Hi @bugraoz93, should I continue working on this issue? Seem like it should be implemented on CLI side at this stage 👀

@bugraoz93
Copy link
Collaborator Author

Hello @jason810496, great question!

Of course, you can still work on this, but my suggestion would be to prioritize other tasks for now to focus on more urgent implementations in this AIP or other AIPs. :)

This is no longer critical for 3.0 (our target for AIP-81). While I hope we’ll eventually have this API endpoint, we now have plenty of time to discuss and implement it before 3.1. Please check the link below. It outlines the timeline for 3.0. Even after the 3.0 release, we’ll still have time to make this happen.
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=308153069#Airflow3.0-Timelines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-81 Enhanced Security in CLI via Integration of API area:API Airflow's REST/HTTP API kind:feature Feature Requests
Projects
Development

No branches or pull requests

3 participants