Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Checkpointing with Workflows #17006

Merged
merged 69 commits into from
Nov 26, 2024
Merged

[Feature] Checkpointing with Workflows #17006

merged 69 commits into from
Nov 26, 2024

Conversation

nerdai
Copy link
Contributor

@nerdai nerdai commented Nov 19, 2024

Description

The additions/changes in this PR are made to enable a user to load and run a Workflow from a stored checkpoint from its past execution(s).

Round 4

Introduce WorkflowCheckpointer class that wraps a Workflow and creates and manages checkpoints.

  • After some iterations on this, it seems that the best approach would be to have a dedicated class to handle checkpointing (i.e., outside of Workflow as well as Context)
  • If in Context: the UX gets a bit awkward because you need to pass context of previous runs along to maintain the checkpoint history
  • If in Workflow: then it becomes too attached to the instance -- you wouldn't be able to create/load checkpoints from different instances of the same Workflow

Sample Usage

from llama_index.core.workflow import WorkflowCheckpointer

wflow_ckptr = WorkflowCheckpointer(workflow=...)
handler = wflow_ckptr.run()
await handler

# see all checkpoints dict
print(wflow_ckptr.checkpoints)

# start from a given ckpt
ckpt = wflow_ckptr.filter_checkpoints(output_event=StartEvent)[0]
handler = wflow_ckptr.run_from(checkpoint=ckpt)
await handler

Enabling/Disabling Steps For Checkpointing

from llama_index.core.workflow import WorkflowCheckpointer

# by default all steps are enabled for checkpointing
wflow_ckptr = WorkflowCheckpointer(workflow=...)

# disable checkpointing for a step
step_name = ...
wflow_ckptr.disable_checkpoint(step=step_name)

# enable checkpointing for a step
step_name = ...
wflow_ckptr.enable_checkpoint(step=step_name)

Type of Change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

  • I added new unit tests to cover this change

@nerdai
Copy link
Contributor Author

nerdai commented Nov 20, 2024

After discussing with @masci, we landed on the following:

  • If we want mostly to run from a checkpoint, then we can (and probably should) just do this in Workflow itself
  • have _broker_log maintain the Checkpoint -- ofc, the Checkpoint should contain all that it needs in order to be executed from it (events, Context states etc.)
  • have a way to navigate/filter Checkpoint
  • have a way to start from a Checkpoint

Will rework to this shape.

@nerdai nerdai changed the title [WIP] WorkflowProfiler [WIP] Checkpointing with Workflows Nov 20, 2024
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@nerdai nerdai changed the title [WIP] Checkpointing with Workflows [Feature] Checkpointing with Workflows Nov 21, 2024
@nerdai nerdai marked this pull request as ready for review November 21, 2024 04:10
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Nov 21, 2024
@nerdai nerdai force-pushed the nerdai/workflow-profiler branch from 359d07a to eaae6fa Compare November 21, 2024 04:19
Copy link
Member

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, left a couple of comments!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Nov 21, 2024
@nerdai nerdai force-pushed the nerdai/workflow-profiler branch 2 times, most recently from 792c415 to 16f0b83 Compare November 23, 2024 19:31
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Nov 25, 2024
@@ -49,6 +49,7 @@ def __init__(
)
self._accepted_events: List[Tuple[str, str]] = []
self._retval: Any = None
self._in_progress: Dict[str, List[Event]] = defaultdict(list)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to keep track of steps that are in progress at the creation time of a checkpoint. Otherwise, since their input events have already been popped from their respective queues, we will lose these events if we attempt to load from a checkpoint.

@nerdai nerdai force-pushed the nerdai/workflow-profiler branch from fbcf4b5 to baf0ae8 Compare November 26, 2024 05:18
@nerdai nerdai merged commit c779a08 into main Nov 26, 2024
11 checks passed
@nerdai nerdai deleted the nerdai/workflow-profiler branch November 26, 2024 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants