Output caching #587

cicdw · 2019-01-29T19:59:10Z

Please nitpick away and help me make this design more elegant!

Please describe your work and make sure your PR:

adds new tests (if appropriate)
updates CHANGELOG.md (if appropriate)
updates docstrings for any new functions or function arguments, including docs/outline.toml for API reference docs (if appropriate)

What does this PR change?

This PR introduces new machinery for dealing with result handling (closes #575); namely, results are only handled in two situations: when a task requests output caching, or when a retry-like state requires cached_inputs. Moreover, each individual task can specify how its data should be handled (and this data is passed around in the state._metadata attribute).

Why is this PR important?

Closes #576 -- this PR reduces our current overhead of serializing every single result and makes the handling of data more nuanced and customizable.

cicdw · 2019-01-29T20:11:59Z

Test failures are caused by mypy, all other tests pass

src/prefect/engine/state.py

jlowin

Great!

I have a few impressionistic thoughts -- one of which is that _metadata feels very "unknowable." It has a very strictly defined schema -- so strict there's the populate_metadata() method (though I think we could get rid of it?) -- but nonetheless is just a nested dict. Should it graduate to a class (or even a DotDict, which would make the API nicer)?

Relatedly, there's a lot of serialization / deserialization of the resulthandlers themselves from inside the metadata. If the metadata had a known schema, then all resulthandlers could be nested schemas within it, and automatically deserialized when the metadata was loaded.

Just some threads to pull on...

jlowin · 2019-01-29T20:05:16Z

src/prefect/engine/cloud/task_runner.py

+        from prefect.serialization.result_handlers import ResultHandlerSchema
+
+        ## if a state has a "cached" attribute or a "cached_inputs" attribute, we need to handle it
+        if getattr(state, "cached_inputs", None) is not None:


Rather than checking for the attribute, I suggest checking if state.is_pending() and state.cached_inputs is not None:, since only Pending states will qualify

Actually that's not entirely correct --> TimedOut states store inputs as well

TIRealized

src/prefect/engine/cloud/task_runner.py

src/prefect/engine/state.py

src/prefect/engine/cloud/task_runner.py

src/prefect/engine/result_handlers/json_result_handler.py

…erialization

cicdw added 10 commits January 28, 2019 11:25

Refactor where result handlers live and the hierarchy of result handlers

de06c72

Pass result handlers down the line

8e8b352

Begin modifying the _metadata attribute of states within the task runner

de0ad58

Add new ensure_raw state method for unpacking non-raw results and data

59d5acf

Ensure all states coming into task runners are raw

8509cf2

Move raw logic to Cloud Runner

ba3a327

Implement finalize_run which handles any data that needs handling

f493376

Add some basic unit tests for the new state methods

5f68559

Resolve merge conflicts with master

b34edae

Polish up finalize_run and add tests

1e6409b

cicdw requested review from jlowin and joshmeek as code owners January 29, 2019 19:59

cicdw commented Jan 29, 2019

View reviewed changes

src/prefect/engine/state.py Show resolved Hide resolved

jlowin requested changes Jan 29, 2019

View reviewed changes

joshmeek reviewed Jan 29, 2019

View reviewed changes

src/prefect/engine/cloud/task_runner.py Show resolved Hide resolved

src/prefect/engine/result_handlers/json_result_handler.py Show resolved Hide resolved

cicdw added 3 commits January 29, 2019 13:31

Address feedback and give metadata attribute more structure

dc97186

Remove unnecessary gets() now that metadata is stricter

d7b0e81

Fix mypy failures and update changelog

bc3a0cb

cicdw changed the title ~~[WIP] Output caching~~ Output caching Jan 29, 2019

cicdw and others added 2 commits January 29, 2019 14:10

Change names per feedback and ensure metadata stays dotdict after des…

cce9ea3

…erialization

Merge branch 'master' into output-caching

9ce92f5

joshmeek previously approved these changes Jan 29, 2019

View reviewed changes

jlowin previously approved these changes Jan 29, 2019

View reviewed changes

Fix failing test caused by overwriting metadata dictionary

01aa3be

cicdw dismissed stale reviews from jlowin and joshmeek via 01aa3be January 29, 2019 22:19

jlowin approved these changes Jan 29, 2019

View reviewed changes

joshmeek approved these changes Jan 29, 2019

View reviewed changes

cicdw merged commit 9b096e1 into master Jan 29, 2019

cicdw deleted the output-caching branch January 29, 2019 22:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output caching #587

Output caching #587

cicdw commented Jan 29, 2019 •

edited

Loading

cicdw commented Jan 29, 2019

jlowin left a comment

jlowin Jan 29, 2019

cicdw Jan 29, 2019

jlowin Jan 29, 2019 •

edited

Loading

Output caching #587

Output caching #587

Conversation

cicdw commented Jan 29, 2019 • edited Loading

What does this PR change?

Why is this PR important?

cicdw commented Jan 29, 2019

jlowin left a comment

Choose a reason for hiding this comment

jlowin Jan 29, 2019

Choose a reason for hiding this comment

cicdw Jan 29, 2019

Choose a reason for hiding this comment

jlowin Jan 29, 2019 • edited Loading

Choose a reason for hiding this comment

cicdw commented Jan 29, 2019 •

edited

Loading

jlowin Jan 29, 2019 •

edited

Loading