-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature request: Storage object backed by a REST API #2835
Comments
I know there is a lot of activity happening in Are the maintainers here open to this idea, and would you consider a pull request for it if I prepared one? Totally understand if the answer is "this deserves some careful thought and we have higher priorities right now". |
@jameslamb Apologies, this got a bit lost. I'm all for this idea if you want to prepare a PR!
I don't think it needs to. Multiple flows tbh are only useful for Docker storage and (for example) the most recent file-based GitHub storage is a one-to-one relationship.
Yeah you would do something like this in build. We do something similar in S3 storage where we generate a key and then set it on the storage object so that way when the storage object is serialized it has all of the information needed to retrieve the flow.
I think having some reasonable defaults for |
Use Case
created from a discussion in Prefect Community Slack
I'd like to propose a new type of storage. For the sake of this conversation, I'll refer to it as
Webhook
storage.With
Webhook
storage, flows are stored and retrieved by HTTP requests. The storage object contains the details needed to construct those requests. I think this could be a lightweight but powerful way to allow users to integrate Prefect with their existing stack.benefit 1: custom storage with external services
This could be a route to using any type of external service that exposes writing and reading binary files over HTTP.
It would allow users to write their own storage classes for things like:
and would allow the use of other cloud providers' object stores that
prefect
doesn't have first-class support for (like Alibaba Cloud Object Storage Service or IBM Cloud object store)benefit 2: integration with internal services
In companies I've worked at / with before, I've seen the pattern where pubic cloud services can only be used directly by infrastructure teams, and data scientists and other application developers are restricted to only using the company's own microservices.
Adding
Webhook
storage would allow users in such a situation to integrate with prefect core server (Cloud or run themselves) without needing to have any credentials that allow direct access to cloud providers (which is necessary to useS3
,GCS
, orAzure
storage).Solution
This might look something like this:
rough sketch implementation (click me)
My basic proposal is that
build()
executes one HTTP request andget_flow()
executes another.Open Questions
How could this work with multiple flows?
Webhook
storage (but I'm sure it could be figured out)How could this support services where you have to write a file before you know enough to read it?
get_flow()
needs that ID to work.build()
, update details of the storage based on the response, then useflow.register(build=False)
Do any details of the HTTP client need to be customizable?
Alternatives
Prefect Cloud Flow Storage
Some of the uses cases mentioned above might be solved by introducing a Prefect Cloud storage service, where you just authenticate with Prefect Cloud and it acts as the cloud storage service.
pros
Content-Type
header and expectations about how the object is named can be hard-coded and hidden from userscons
Doing Nothing
Maybe this isn't a big enough concern to warrant growing the
prefect
codebase. All new code comes with maintenance costs, and maybe the added maintenance cost of this feature outweighs the benefit to users of an extension like this.Closing Thoughts
If the maintainers here agree that this feature is worth pursuing, I want to note that I'd be happy to attempt a pull request. You all have been so careful and thoughtful in the design of the boundaries between different components, I feel confident that I could come up with a reasonable implementation.
Thanks for your time and consideration!
The text was updated successfully, but these errors were encountered: