-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output caching #587
Output caching #587
Conversation
Test failures are caused by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
I have a few impressionistic thoughts -- one of which is that _metadata
feels very "unknowable." It has a very strictly defined schema -- so strict there's the populate_metadata()
method (though I think we could get rid of it?) -- but nonetheless is just a nested dict. Should it graduate to a class (or even a DotDict, which would make the API nicer)?
Relatedly, there's a lot of serialization / deserialization of the resulthandlers themselves from inside the metadata. If the metadata had a known schema, then all resulthandlers could be nested schemas within it, and automatically deserialized when the metadata was loaded.
Just some threads to pull on...
from prefect.serialization.result_handlers import ResultHandlerSchema | ||
|
||
## if a state has a "cached" attribute or a "cached_inputs" attribute, we need to handle it | ||
if getattr(state, "cached_inputs", None) is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than checking for the attribute, I suggest checking if state.is_pending() and state.cached_inputs is not None:
, since only Pending
states will qualify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that's not entirely correct --> TimedOut
states store inputs as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIR
ealized
Please nitpick away and help me make this design more elegant!
Please describe your work and make sure your PR:
CHANGELOG.md
(if appropriate)docs/outline.toml
for API reference docs (if appropriate)What does this PR change?
This PR introduces new machinery for dealing with result handling (closes #575); namely, results are only handled in two situations: when a task requests output caching, or when a retry-like state requires cached_inputs. Moreover, each individual task can specify how its data should be handled (and this data is passed around in the
state._metadata
attribute).Why is this PR important?
Closes #576 -- this PR reduces our current overhead of serializing every single result and makes the handling of data more nuanced and customizable.