Improve pipeline stop logic to ensure join is called exactly once for all stages #1479

efajardo-nv · 2024-01-25T20:41:41Z

Description

Removes the _is_built, _is_started and _is_stopped flags and replaces with single member which holds onto the state enum for: INITIALIZED, BUILT, STARTED, STOPPED, COMPLETED
Changes the meaning of stop() and the meaning of join() for stages
1. stop() called 0 or 1 times. Only way it can get called is if pipeline.stop() was called indicating the pipeline should try to shut down gracefully.
  1. Users should only implement this method if they have a source stage (or sources in their stage)
2. join() called exactly 1 time. Only called when the pipeline is complete and all stages are shut down. This is where users should implement any cleanup code
Tests for handling all of these scenarios with the pipeline.

By Submitting this PR I confirm:

I am familiar with the Contributing Guidelines.
When the PR is ready for review, new or existing tests cover these changes.
When the PR is ready for review, the documentation is up to date with these changes.

morpheus/pipeline/pipeline.py

mdemoret-nv

I think your PR makes the fixes specified in the issue but looking at the code in here, I think we can do more to improve how the pipeline state is handled. I have the following recommendations:

Convert the _is_built, _is_started and _is_stopped flags into a single member which holds onto the state enum.
1. An enum works well for the state because the pipeline state should only progress forwards
2. The enum values should look something like: initialized, built, running, stopping, completed
3. Changes to the state value should be guarded by a mutex to prevent changing the state from multiple threads
We should move all of the logic in the join() method into _start() after the call to self._mrc_executor.start()
1. This is necessary because there are functions in join() which need to be called 100% of the time. However, its not required by the user to call pipeline.join(). Its its possible with the current API that stages will never have their stop() and join() method called.
2. To fix this, we just need to introduce an asyncio.Event into the pipeline
3. If you follow the example in the docs, we pretty much want to use the same pattern.
  1. After self._mrc_executor.start() is called, create a new task which immediately calls self._mrc_executor.join_async(). This will block the task until the pipeline is complete.
  2. After join_async() we should have all of the same code which is currently in Pipeline.join() to loop over all stages calling join().
  3. Finally, the task should set the pipeline state to Complete and call set() on the event object
  4. All that should remain in Pipeline.join() is calling await self._completion_event.wait() which will block that method from returning until the pipeline finishes.
We should change the meaning of stop() and the meaning of join() for stages
1. stop() should only get called 0 or 1 times. The only way it should get called is if pipeline.stop() was called indicating the pipeline should try to shut down gracefully.
  1. Users should only implement this method if they have a source stage (or sources in their stage)
2. join() should get called exactly 1 time. It should get called when the pipeline is complete and all stages are shut down. This is where users should implement any cleanup code
We should store the order that stages were built into a list and use this list when iterating over all the stages in stop() and join()
1. This is a small change but will guarantee that stages are stopped and joined in the same order they are built.
We should add edge condition tests for handling all of these scenarios with the pipeline.
1. We have a few checks but more robust tests would be very powerful here.

morpheus/pipeline/pipeline.py

…into pipeline-stop-fix

…rpheus into pipeline-stop-fix

mdemoret-nv

Just missing tests on the join() method. Do the same tests for both normal and out of order uses.

morpheus/pipeline/pipeline.py

…into pipeline-stop-fix

morpheus/pipeline/pipeline.py

tests/pipeline/test_pipeline_state.py

…into pipeline-stop-fix

mdemoret-nv · 2024-02-15T20:11:36Z

/merge

update pipeline stop logic

2da0f39

efajardo-nv added bug Something isn't working non-breaking Non-breaking change labels Jan 25, 2024

efajardo-nv self-assigned this Jan 25, 2024

drobison00 reviewed Jan 26, 2024

View reviewed changes

morpheus/pipeline/pipeline.py Outdated Show resolved Hide resolved

mdemoret-nv requested changes Jan 26, 2024

View reviewed changes

morpheus/pipeline/pipeline.py Outdated Show resolved Hide resolved

efajardo-nv added 14 commits February 2, 2024 10:49

pr feedback updates

3d4cdfe

Merge branch 'branch-24.03' of https://github.com/nv-morpheus/Morpheus …

f3c72b2

…into pipeline-stop-fix

add post_start task

6c3d59e

fix by adding pipeline.join to run_async

b454dcc

update http server unit test

ba8f21f

Merge branch 'branch-24.03' of https://github.com/nv-morpheus/Morpheus …

1512d0b

…into pipeline-stop-fix

replace asyncio event with future to propagate exceptions

83dc809

update monitor stage test

a05552d

pipeline state unit tests

16093ac

remove commented lines

c402fe3

remove commented lines

a6b719f

fix copyright

fee8124

add remaining pipeline state tests

a17f8d5

Merge branch 'branch-24.03' of https://github.com/nv-morpheus/Morpheus …

4ae1c34

…into pipeline-stop-fix

efajardo-nv marked this pull request as ready for review February 9, 2024 18:59

efajardo-nv requested a review from a team as a code owner February 9, 2024 18:59

efajardo-nv added 5 commits February 12, 2024 11:05

Merge branch 'branch-24.03' into pipeline-stop-fix

e05da81

add pipeline build tests

fb3727c

Merge branch 'branch-24.03' of https://github.com/nv-morpheus/Morpheus …

fd49cd1

…into pipeline-stop-fix

Merge branch 'pipeline-stop-fix' of https://github.com/efajardo-nv/Mo…

4b9ffc6

…rpheus into pipeline-stop-fix

test pipeline build test names

64bb13f

mdemoret-nv reviewed Feb 12, 2024

View reviewed changes

morpheus/pipeline/pipeline.py Outdated Show resolved Hide resolved

morpheus/pipeline/pipeline.py Show resolved Hide resolved

morpheus/pipeline/pipeline.py Show resolved Hide resolved

efajardo-nv added 2 commits February 12, 2024 15:41

pipeline.join error handling

8acc4d0

pipeline join tests

88ba3bd

efajardo-nv added 3 commits February 12, 2024 15:43

Merge branch 'branch-24.03' of https://github.com/nv-morpheus/Morpheus …

33883b8

…into pipeline-stop-fix

add stage methods called tests for pipeline joins

c50ea80

Merge branch 'branch-24.03' of https://github.com/nv-morpheus/Morpheus …

aad652d

…into pipeline-stop-fix

mdemoret-nv approved these changes Feb 14, 2024

View reviewed changes

morpheus/pipeline/pipeline.py Outdated Show resolved Hide resolved

tests/pipeline/test_pipeline_state.py Show resolved Hide resolved

tests/pipeline/test_pipeline_state.py Show resolved Hide resolved

efajardo-nv added 2 commits February 14, 2024 16:26

remove error for join after join

41df4c5

Merge branch 'branch-24.03' of https://github.com/nv-morpheus/Morpheus …

cfb46a7

…into pipeline-stop-fix

mdemoret-nv changed the title ~~Update pipeline stop logic~~ Improve pipeline stop logic to ensure join is called exactly once for all stages Feb 15, 2024

rapids-bot bot merged commit 5fd661b into nv-morpheus:branch-24.03 Feb 15, 2024
10 checks passed

efajardo-nv deleted the pipeline-stop-fix branch July 29, 2024 21:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve pipeline stop logic to ensure join is called exactly once for all stages #1479

Improve pipeline stop logic to ensure join is called exactly once for all stages #1479

efajardo-nv commented Jan 25, 2024 •

edited

Loading

mdemoret-nv left a comment

mdemoret-nv left a comment

mdemoret-nv commented Feb 15, 2024

Improve pipeline stop logic to ensure join is called exactly once for all stages #1479

Improve pipeline stop logic to ensure join is called exactly once for all stages #1479

Conversation

efajardo-nv commented Jan 25, 2024 • edited Loading

Description

By Submitting this PR I confirm:

mdemoret-nv left a comment

Choose a reason for hiding this comment

mdemoret-nv left a comment

Choose a reason for hiding this comment

mdemoret-nv commented Feb 15, 2024

efajardo-nv commented Jan 25, 2024 •

edited

Loading