TaskRunner refactor #260

jlowin · 2018-10-10T16:10:18Z

As discussed offline, the Runners are becoming an opaque block of if-statements and State manipulations.

This refactor splits the TaskRunner into a larger number of small method calls, referred to as "steps". Each step takes a state argument (and any other arguments it requires) and does one of two things:

returns a State
raises an ENDRUN(State) error

If it returns a state, that value is passed to the next state in the pipeline. If it raises an ENDRUN(State) error, then the pipeline is terminated and the wrapped state becomes the final state of the task.

For example, there is a step called TaskRunner.check_task_trigger_step(). When called with a state value, it checks the task's trigger function. If the trigger passes, it returns the same state so the next step can use it. If the trigger fails, then it raises ENDRUN(TriggerFailed()) (or the actual error raised by the trigger, if appropriate, and all step processing stops.

Another example is a state called TaskRunner.set_task_to_running_step(). This expects to get a PENDING state and returns a RUNNING state if it does; otherwise it raises ENDRUN(state) to indicate that an unexpected state was received and processing should halt.

This has already helped identify some subtle edge cases -- for example, the any_failed and any_successful triggers would fail if the task had no upstream tasks. In practice, this never mattered because tasks with no upstream tasks are always start_tasks and therefore ignore their triggers anyway -- but it's still bad practice. Now, tasks with no upstream tasks always pass the trigger check.

Important Note

This new ENDRUN exception is very similar to the existing DONTRUN signal, so you might ask why we need it?

DONTRUN is exclusively used by Prefect's engine to indicate that some processing should stop. However, it lives in the signals module which we intend users to use. This creates the need to trap DONTRUNs in a special way, since sometimes the TaskRunner/FlowRunner is using them for control flow, or sometimes a user might be using it (without realizing it has special significance).

ENDRUN is intended to fully replace DONTRUN in terms of functionality, and is not even in signals.py (so users should not be trying to use it!). If this PR is accepted, I will submit another one (probably much simpler) to refactor the FlowRunner, at which time I will remove DONTRUN completely. But until that time, it's still needed by the existing FlowRunner.

jlowin · 2018-10-10T17:59:39Z

@cicdw making one more small change here to split out caching from running.

jlowin · 2018-10-10T19:29:19Z

@cicdw ready for review!

cicdw · 2018-10-10T21:01:57Z

I haven't finished a full detailed review but two incredibly superficial / trivial things stood out to me:

in other places we simply call the input argument upstream_states and here you're calling it upstream_states_set; I think I prefer just upstream_states because technically the logic is sound even if you pass a list / iterable
the pattern self.do_a_thing_step reads awkwardly to me

jlowin · 2018-10-10T21:05:12Z

I agree with both points.

upstream_states_set was actually because I got tired of copying that huge Dict[Edge, Union...] type everywhere... and just didn't replace it later. The truth is I think that the set is what we want for most of these (the rest of the information is surperfluous, often we just want a set of states). Let me look at cleanly replacing it.

These could definitely use the word step -- it's a holdover from when they were classes (and just a way to set them apart). Doesn't serve any purpose other than to identify them as having a specific signature. Should I just remove the _step suffix?

jlowin · 2018-10-11T00:44:51Z

On further thought, the upstream_states_set is a holdover from the previous implementation -- I think it was handled worse there, where the call was get_run_state(upstream_states=upstream_states_set) -- so I'm trying to unify the kwarg and the variable name.

upstream_states has type Dict[Edge, Union[State, List[State]]]
upstream_states_set has type Set[State], so it's much simpler for the steps to work with (and they don't need any other detail)

cicdw

Some incredibly minor superficial changes requested; this is a really nice refactor and I really like how this makes the tests very targeted and easy to understand / read. 💯

I also think it's a great sign that no FlowRunner or Flow tests (other than the one) had to be updated to handle this significant refactor.

cicdw · 2018-10-10T21:02:17Z

CHANGELOG.md

@@ -3,7 +3,7 @@
 ## 0.4.0 <Badge text="alpha" type="warn">
 ### Major Features

- None
+- Refactor `TaskRunner` into a moduler pipeline - [#260](https://github.com/PrefectHQ/prefect/pull/260)


modular (sp)

cicdw · 2018-10-11T01:49:34Z

src/prefect/engine/task_runner.py

-        # check that upstream tasks are finished
-        # ---------------------------------------------------------
+        Raises:
+            - signals.ENDRUN if upstream tasks are not finished.


FYI: if you write the Raises: section like:

Raises: - signals.ENDRUN: if upstream tasks are not finished

(note the added colon) the signals.ENDRUN will be code-formatted in the docs, otherwise it'll just be formatted as plain text.

cicdw · 2018-10-11T01:54:05Z

src/prefect/engine/task_runner.py

+
+        Raises:
+            - signals.PAUSE if the task raises PAUSE
+            - signals.ENDRUN if the task is not ready to run


Similar for my other Raises comment.

cicdw · 2018-10-11T01:57:43Z

tests/engine/test_task_runner.py

+            cached_inputs={"a": 1},
+            cached_result=2,
+            # cached_result_expiration=datetime.datetime.utcnow()
+            # + datetime.timedelta(minutes=1),


Should these comments just be removed?

Add state_handlers for implementing callbacks when states change

cicdw

LGTM!

jlowin added 2 commits October 10, 2018 11:53

Pin pytest because of test that depends on monkeypatch.context

8ca48e6

Refactor TaskRunner into pipeline of methods

2b5ce3c

jlowin added the class: Task Runner label Oct 10, 2018

jlowin requested a review from cicdw as a code owner October 10, 2018 16:10

Update Changelog

d36022e

Add step for caching success state

5faf7b6

jlowin added 4 commits October 10, 2018 18:09

Add state_handlers

d8483d5

Update CHANGELOG.md

1a44437

Formatting updates

7516fe2

Remove step suffix

25d93f7

jlowin force-pushed the taskrunner-refactor branch from f5283e8 to 25d93f7 Compare October 11, 2018 00:41

jlowin added 3 commits October 10, 2018 21:18

Merge branch 'taskrunner-refactor' into state-callbacks

5b7db96

Clean up handler tests

b8a6cce

⚫

f636276

jlowin mentioned this pull request Oct 11, 2018

Add state_handlers for implementing callbacks when states change #264

Merged

cicdw requested changes Oct 11, 2018

View reviewed changes

jlowin added 8 commits October 11, 2018 12:55

Formatting changes

4524a99

Merge branch 'taskrunner-refactor' into state-callbacks

d0f1471

Update changelog

e3b2896

Update docs

f6b5014

Fix bug in tests

f0d182f

Remove superfluous method

748da71

Clean up following review

3b7fb6d

Merge pull request #264 from PrefectHQ/state-callbacks

b3cc501

Add state_handlers for implementing callbacks when states change

cicdw approved these changes Oct 11, 2018

View reviewed changes

jlowin merged commit 4abd6e3 into master Oct 11, 2018

jlowin deleted the taskrunner-refactor branch October 11, 2018 21:01

jlowin mentioned this pull request Oct 11, 2018

Flowrunner refactor #267

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TaskRunner refactor #260

TaskRunner refactor #260

jlowin commented Oct 10, 2018

jlowin commented Oct 10, 2018

jlowin commented Oct 10, 2018

cicdw commented Oct 10, 2018

jlowin commented Oct 10, 2018

jlowin commented Oct 11, 2018

cicdw left a comment

cicdw Oct 10, 2018

cicdw Oct 11, 2018

cicdw Oct 11, 2018

cicdw Oct 11, 2018

cicdw left a comment

TaskRunner refactor #260

TaskRunner refactor #260

Conversation

jlowin commented Oct 10, 2018

Important Note

jlowin commented Oct 10, 2018

jlowin commented Oct 10, 2018

cicdw commented Oct 10, 2018

jlowin commented Oct 10, 2018

jlowin commented Oct 11, 2018

cicdw left a comment

Choose a reason for hiding this comment

cicdw Oct 10, 2018

Choose a reason for hiding this comment

cicdw Oct 11, 2018

Choose a reason for hiding this comment

cicdw Oct 11, 2018

Choose a reason for hiding this comment

cicdw Oct 11, 2018

Choose a reason for hiding this comment

cicdw left a comment

Choose a reason for hiding this comment