[Feature] Export to Python Workflow Definition#882
Conversation
📝 WalkthroughWalkthroughAdds an optional Changes
Sequence DiagramsequenceDiagram
participant User
participant Executor as Executor<br/>(Flux/Slurm/Single)
participant Scheduler as DependencyTask<br/>Scheduler
participant Export as export_dependency<br/>_graph_function
participant FS as File System
User->>Executor: init(export_workflow_filename)
Executor->>Scheduler: __init__(..., export_workflow_filename=...)
Scheduler->>Scheduler: store _export_workflow_filename
Note over Executor,Scheduler: Workflow runs, tasks scheduled/executed
Executor->>Scheduler: __exit__()
alt _export_workflow_filename provided
Scheduler->>Export: export_dependency_graph_function(nodes, edges, filename)
Export->>Export: build JSON structure (nodes, edges, ports)
Export->>FS: write JSON file (filename)
else no export filename
Scheduler->>Scheduler: call plot_dependency_graph_function(plot_filename)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/executorlib/task_scheduler/interactive/dependency_plot.py (1)
263-271: Non-JSON-serializable values may causeTypeErrorat runtime.The
elsebranch serializesn["value"]directly. If the value is a non-JSON-serializable object (e.g., a custom class instance, datetime, or other complex types),json.dumpwill raise aTypeError. Consider adding a fallback to convert such values to strings.🔎 Example defensive approach
else: + try: + # Test if value is JSON serializable + json.dumps(n["value"]) + value = n["value"] + except (TypeError, ValueError): + value = str(n["value"]) pwd_nodes_lst.append( { "id": n["id"], "type": n["type"], - "value": n["value"], + "value": value, "name": n["name"], } )
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/executorlib/executor/flux.pysrc/executorlib/executor/single.pysrc/executorlib/executor/slurm.pysrc/executorlib/task_scheduler/interactive/dependency.pysrc/executorlib/task_scheduler/interactive/dependency_plot.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/executorlib/task_scheduler/interactive/dependency.py (1)
src/executorlib/task_scheduler/interactive/dependency_plot.py (2)
export_dependency_graph_function(237-307)plot_dependency_graph_function(206-234)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: unittest_openmpi (ubuntu-24.04-arm, 3.13)
- GitHub Check: unittest_mpich (ubuntu-latest, 3.13)
- GitHub Check: unittest_mpich (ubuntu-latest, 3.12)
- GitHub Check: unittest_mpich (ubuntu-latest, 3.11)
- GitHub Check: unittest_openmpi (ubuntu-latest, 3.13)
- GitHub Check: unittest_mpich (ubuntu-22.04-arm, 3.13)
- GitHub Check: unittest_mpich (ubuntu-24.04-arm, 3.13)
- GitHub Check: unittest_openmpi (ubuntu-latest, 3.12)
- GitHub Check: unittest_openmpi (macos-latest, 3.13)
- GitHub Check: unittest_openmpi (ubuntu-22.04-arm, 3.13)
- GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
- GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
- GitHub Check: unittest_win
- GitHub Check: unittest_old
- GitHub Check: unittest_flux_openmpi
- GitHub Check: minimal
🔇 Additional comments (10)
src/executorlib/executor/slurm.py (2)
105-106: LGTM!The
export_workflow_filenameparameter is correctly added to the constructor signature and properly propagated toDependencyTaskScheduler. The documentation is appropriately updated.Also applies to: 231-231
320-321: LGTM!The
export_workflow_filenameparameter is correctly added and propagated inSlurmJobExecutor, consistent with the pattern inSlurmClusterExecutor.Also applies to: 404-404
src/executorlib/executor/flux.py (2)
109-110: LGTM!The
export_workflow_filenameparameter is correctly integrated intoFluxJobExecutorwith proper documentation and propagation to the underlyingDependencyTaskScheduler.Also applies to: 195-195
301-302: LGTM!The
export_workflow_filenameparameter is correctly integrated intoFluxClusterExecutor, following the same consistent pattern as other executor classes.Also applies to: 430-430
src/executorlib/executor/single.py (2)
98-99: LGTM!The
export_workflow_filenameparameter is correctly integrated intoSingleNodeExecutorwith proper documentation and propagation.Also applies to: 177-177
270-271: LGTM!The
export_workflow_filenameparameter is correctly integrated intoTestClusterExecutor, maintaining consistency with the other executor implementations.Also applies to: 368-368
src/executorlib/task_scheduler/interactive/dependency.py (3)
68-74: Verify the conditional logic for_generate_dependency_graph.The logic sets
_generate_dependency_graph = Truewhenplot_dependency_graph_filename is not NoneOR whenexport_workflow_filename is None. This means if neither filename is provided, the graph is still generated (but not saved anywhere useful sinceplot_dependency_graph_functionwithfilename=Nonedisplays inline in Jupyter).Was the intention to only generate the graph when at least one filename is provided, or is the inline Jupyter display the expected fallback behavior?
219-230: LGTM!The
__exit__method correctly dispatches to eitherexport_dependency_graph_functionorplot_dependency_graph_functionbased on the provided filename. The export path correctly uses the new JSON export function whenexport_workflow_filenameis specified.
15-20: LGTM!The import of
export_dependency_graph_functionis correctly added alongside the existing imports from the same module.src/executorlib/task_scheduler/interactive/dependency_plot.py (1)
2-8: Addnumpyto the project dependencies.The code imports
numpyat line 8 for handlingnp.ndarrayserialization in the newexport_dependency_graph_functionfunction. However,numpyis not declared as a dependency inpyproject.toml. Since this function is called fromsrc/executorlib/task_scheduler/interactive/dependency.pyand numpy is not transitively available through the graph optional dependencies, addnumpyto either the main dependencies or the graph optional dependency group.
| pwd_edges_lst.append( | ||
| { | ||
| "target": final_node["id"], | ||
| "targetPort": None, | ||
| "source": max([e["target"] for e in pwd_edges_lst]), | ||
| "sourcePort": None, | ||
| } | ||
| ) |
There was a problem hiding this comment.
Handle empty edge list to prevent ValueError on max().
If edge_lst is empty, pwd_edges_lst will be empty, and max([e["target"] for e in pwd_edges_lst]) will raise ValueError: max() arg is an empty sequence. This can occur with single-node workflows or graphs without edges.
🔎 Proposed fix
- pwd_edges_lst.append(
- {
- "target": final_node["id"],
- "targetPort": None,
- "source": max([e["target"] for e in pwd_edges_lst]),
- "sourcePort": None,
- }
- )
+ if pwd_edges_lst:
+ pwd_edges_lst.append(
+ {
+ "target": final_node["id"],
+ "targetPort": None,
+ "source": max(e["target"] for e in pwd_edges_lst),
+ "sourcePort": None,
+ }
+ )🤖 Prompt for AI Agents
In src/executorlib/task_scheduler/interactive/dependency_plot.py around lines
293-300, the code calls max() on pwd_edges_lst which raises ValueError when
pwd_edges_lst is empty; update the logic to handle an empty list by computing
source only if pwd_edges_lst has elements (e.g., source = max(e["target"] for e
in pwd_edges_lst)), otherwise set source to a safe fallback (for example
final_node["id"] or None) or skip adding the edge entirely; implement a short
conditional that selects the fallback and then append the dict using that source
to avoid the max() on an empty sequence.
for more information, see https://pre-commit.ci
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #882 +/- ##
==========================================
+ Coverage 93.34% 93.41% +0.07%
==========================================
Files 38 38
Lines 1817 1837 +20
==========================================
+ Hits 1696 1716 +20
Misses 121 121 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/executorlib/task_scheduler/interactive/dependency.py (1)
37-38: Docstring contains duplicate attribute name with incorrect type.Line 37 describes
_generate_dependency_graph (bool)and line 38 reuses the same attribute name with type(str). The second entry should be_plot_dependency_graph_filename (str)based on the actual attribute.🔎 Proposed fix
Attributes: _future_hash_dict (Dict[str, Future]): A dictionary mapping task hash to future object. _task_hash_dict (Dict[str, Dict]): A dictionary mapping task hash to task dictionary. _generate_dependency_graph (bool): Whether to generate the dependency graph. - _generate_dependency_graph (str): Name of the file to store the plotted graph in. + _plot_dependency_graph_filename (str): Name of the file to store the plotted graph in. + _export_workflow_filename (str): Name of the file to store the exported workflow graph in.
🧹 Nitpick comments (2)
src/executorlib/task_scheduler/interactive/dependency.py (1)
216-227: Export and plot are mutually exclusive when both filenames are provided.When both
export_workflow_filenameandplot_dependency_graph_filenameare specified, only the export is performed due to the if/else structure. If this is intentional, consider documenting this precedence in the class docstring. Otherwise, consider supporting both operations when both filenames are provided.🔎 Proposed fix to support both operations
if self._generate_dependency_graph: node_lst, edge_lst = generate_nodes_and_edges_for_plotting( task_hash_dict=self._task_hash_dict, future_hash_inverse_dict={ v: k for k, v in self._future_hash_dict.items() }, ) if self._export_workflow_filename is not None: - return export_dependency_graph_function( + export_dependency_graph_function( node_lst=node_lst, edge_lst=edge_lst, file_name=self._export_workflow_filename, ) - else: - return plot_dependency_graph_function( + if self._plot_dependency_graph_filename is not None: + plot_dependency_graph_function( node_lst=node_lst, edge_lst=edge_lst, filename=self._plot_dependency_graph_filename, ) - else: - return None + return Nonetests/test_singlenodeexecutor_pwd.py (1)
8-15: Minor: Trailing whitespace on line 10.Line 10 has trailing whitespace after
return x + y. Pre-commit hooks should catch this, but worth noting.🔎 Proposed fix
def get_sum(x, y): - return x + y - + return x + y + def get_prod_and_div(x, y):
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/executorlib/task_scheduler/interactive/dependency.pytests/test_singlenodeexecutor_pwd.py
🧰 Additional context used
🧬 Code graph analysis (2)
src/executorlib/task_scheduler/interactive/dependency.py (1)
src/executorlib/task_scheduler/interactive/dependency_plot.py (2)
export_dependency_graph_function(237-307)plot_dependency_graph_function(206-234)
tests/test_singlenodeexecutor_pwd.py (2)
src/executorlib/executor/single.py (1)
SingleNodeExecutor(20-194)src/executorlib/standalone/select.py (1)
get_item_from_future(42-54)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-mpich.yml)
- GitHub Check: benchmark (ubuntu-latest, 3.13, .ci_support/environment-openmpi.yml)
- GitHub Check: notebooks_integration
🔇 Additional comments (5)
src/executorlib/task_scheduler/interactive/dependency.py (2)
16-16: LGTM!Import addition aligns with the new export functionality.
49-49: LGTM!The new
export_workflow_filenameparameter is correctly added and the initialization logic properly enables dependency graph generation when either filename is provided.Also applies to: 67-71
tests/test_singlenodeexecutor_pwd.py (3)
18-21: LGTM!Proper tearDown implementation to clean up the generated workflow.json file after each test.
23-36: LGTM!The test effectively validates the arithmetic workflow export with chained futures using
get_item_from_future. The assertion thatfuture_result.result()isNonecorrectly reflects the graph generation mode behavior where tasks are recorded but not executed.
38-47: LGTM!Good coverage for NumPy array handling in workflow export. The test verifies that numpy arrays are properly serialized in the workflow graph (converted to lists per
export_dependency_graph_functionimplementation).
As demonstrated in https://github.com/pyiron-dev/executorlib-export-python-workflow-definition/
Example:
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.