Fix working directory #481

jan-janssen · 2024-11-08T21:35:41Z

Summary by CodeRabbit

New Features
- Introduced a new optional parameter, cache_directory, across multiple functions to specify the directory for storing HDF5 files, enhancing flexibility in file management.
Bug Fixes
- Updated logic for setting working directories to ensure they default to the specified cache_directory when necessary.
Documentation
- Updated function docstrings to include descriptions for the new cache_directory parameter.

coderabbitai · 2024-11-08T21:35:47Z

Walkthrough

The pull request introduces a new optional parameter, cache_directory, to multiple functions within the executorlib module, specifically in executorlib/cache/shared.py, executorlib/standalone/cache/queue.py, and executorlib/standalone/cache/spawner.py. This parameter allows users to specify a directory for storing HDF5 files, enhancing the flexibility of file management. The logic for constructing file paths and handling resource dictionaries has been updated accordingly, while maintaining existing error handling behaviors.

Changes

File Path	Change Summary
executorlib/cache/shared.py	Added `cache_directory` parameter to `execute_tasks_h5` function; updated file path logic and command generation.
executorlib/standalone/cache/queue.py	Added `cache_directory` parameter to `execute_with_pysqa` function; modified `resource_dict` handling.
executorlib/standalone/cache/spawner.py	Added `cache_directory` parameter to `execute_in_subprocess` function; updated `resource_dict` initialization.

Possibly related PRs

Cache: Add working directory parameter #446: Introduces a new optional parameter cwd to the execute_tasks_h5 function, enhancing flexibility related to the cache_directory parameter.
Cache: Use explicit arguments for serialize_funct_h5() #448: Modifies the execute_tasks_h5 function to use explicit arguments for serialize_funct_h5, relevant to the updates in the main PR.
Add option to disable_dependencies for cache #477: Adds the disable_dependencies parameter in both the execute_tasks_h5 function and the FileExecutor class, relating to the overall functionality of task execution.
Cache: create method #459: Includes changes to the FileExecutor class that may interact with the execute_tasks_h5 function, particularly in how tasks are executed and managed.

🐰 In the cache where files reside,
A new path opens wide,
With cache_directory we play,
HDF5 finds its way,
Flexibility's the name of the game,
In our coding hall of fame! 🎉

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (4)

executorlib/standalone/cache/spawner.py (2)
12-12: Enhance the cache_directory parameter documentation.

The docstring could be more descriptive about:

Whether the directory must exist or will be created

What happens if the directory is None (default behavior)

The expected permissions and requirements for the directory

Also applies to: 26-26

42-45: Simplify resource dictionary initialization.

The two conditions can be combined for better readability.
-    if resource_dict is None:
-        resource_dict = {"cwd": cache_directory}
-    elif len(resource_dict) == 0:
-        resource_dict = {"cwd": cache_directory}
+    if resource_dict is None or len(resource_dict) == 0:
+        resource_dict = {"cwd": cache_directory}
🧰 Tools

🪛 Ruff

42-45: Combine if branches using logical or operator

Combine if branches

(SIM114)
executorlib/standalone/cache/queue.py (1)
28-28: Enhance parameter documentation

Consider expanding the documentation for cache_directory to include:

Its relationship with resource_dict["cwd"]

Whether the directory must exist

The purpose of storing HDF5 files there
-        cache_directory (str): The directory to store the HDF5 files.
+        cache_directory (str, optional): Directory to store HDF5 files. If provided, this value
+            will be used as the working directory when resource_dict is None, empty, or has
+            cwd=None. The directory should exist before execution.
executorlib/cache/shared.py (1)
Line range hint 52-67: Add validation for cache_directory parameter

The function relies heavily on the cache_directory for file operations but doesn't validate if the directory exists or is writable. This could lead to runtime errors if an invalid directory is provided.

Consider adding validation at the start of the function:
def execute_tasks_h5(
    future_queue: queue.Queue,
    cache_directory: str,
    execute_function: callable,
    resource_dict: dict,
    terminate_function: Optional[callable] = None,
    pysqa_config_directory: Optional[str] = None,
    backend: Optional[str] = None,
    disable_dependencies: bool = False,
) -> None:
+    if not os.path.isdir(cache_directory):
+        raise ValueError(f"Cache directory does not exist: {cache_directory}")
+    if not os.access(cache_directory, os.W_OK):
+        raise ValueError(f"Cache directory is not writable: {cache_directory}")

    memory_dict, process_dict, file_name_dict = {}, {}, {}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 2132028 and 9afcb56.

📒 Files selected for processing (3)

executorlib/cache/shared.py (1 hunks)
executorlib/standalone/cache/queue.py (2 hunks)
executorlib/standalone/cache/spawner.py (3 hunks)

🧰 Additional context used

🪛 Ruff

executorlib/standalone/cache/queue.py

33-36: Combine if branches using logical or operator

Combine if branches

(SIM114)

executorlib/standalone/cache/spawner.py

42-45: Combine if branches using logical or operator

Combine if branches

(SIM114)

🔇 Additional comments (5)

executorlib/standalone/cache/spawner.py (1)

Line range hint 12-45: Add validation for cache_directory parameter.

The cache_directory parameter is used directly without validation, which could lead to security issues:

Directory existence is not verified
No protection against path traversal attacks
Directory permissions are not checked

Consider adding validation before using the directory as the working directory for subprocesses.

executorlib/standalone/cache/queue.py (2)

14-14: LGTM: Clean signature update

The new cache_directory parameter is properly typed and follows Python conventions.

14-14: Verify consistent cache_directory handling across files

Since this change affects working directory handling, we should verify that other files mentioned in the PR (executorlib/cache/shared.py and executorlib/standalone/cache/spawner.py) handle cache_directory consistently.

Also applies to: 34-38

✅ Verification successful

Based on the search results, I can now provide a final response:

Cache directory handling is consistent across the codebase

The cache_directory parameter is handled consistently across all relevant files:

In executorlib/standalone/cache/queue.py and executorlib/standalone/cache/spawner.py, it's used to set the working directory (cwd) in the resource dictionary
In executorlib/cache/shared.py and executorlib/cache/executor.py, it's used for storing HDF5 files
All implementations treat it as an optional parameter with similar documentation patterns
The parameter is properly propagated through the execution chain

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for cache_directory parameter usage in related files
rg -A 10 "def.*cache_directory" "executorlib/cache/shared.py" "executorlib/standalone/cache/spawner.py"

Length of output: 101

Script:

#!/bin/bash
# Let's try a broader search for cache_directory usage
rg -A 5 "cache_directory" "executorlib/"

# Also search for any function definitions in these specific files
rg -A 10 "def " "executorlib/cache/shared.py" "executorlib/standalone/cache/spawner.py"

# And check for any class definitions that might be relevant
rg -A 10 "class.*Cache" "executorlib/"

Length of output: 24911

executorlib/cache/shared.py (2)

134-134: LGTM: Correctly propagates cache_directory parameter

The addition of the cache_directory parameter to the execute_function call is consistent with the PR's objective and ensures proper working directory handling.

Line range hint 52-67: Verify consistent cache_directory usage across the codebase

Let's verify that the cache_directory parameter is consistently handled across related files mentioned in the AI summary.

✅ Verification successful

Based on the comprehensive search results, I can now generate the final response:

Cache directory parameter is consistently used across the codebase

The verification shows that the cache_directory parameter is consistently handled:

In executorlib/cache/shared.py, it's properly documented and used for HDF5 file storage
In executorlib/cache/executor.py, it's consistently passed through to the execution functions
In executorlib/interactive/shared.py, it's properly handled for task execution caching
All test files demonstrate consistent usage with proper directory creation and cleanup

The parameter is used consistently for:

Storing HDF5 input/output files
Setting working directories for task execution
Maintaining cache across different executor implementations

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for consistent cache_directory parameter usage across related files

# Check for cache_directory parameter in related files
echo "Checking cache_directory parameter usage in related files:"
rg -A 5 "def execute_" "executorlib/standalone/cache/"

# Check for potential file operations that might need the cache_directory
echo -e "\nChecking file operations that might need cache_directory:"
rg "os\.path\.(join|exists|isdir)" "executorlib/standalone/cache/"

Length of output: 1238

Script:

#!/bin/bash
# Let's expand our search to include more variations and the entire codebase

# Check for cache_directory parameter usage in the entire codebase
echo "Checking cache_directory parameter usage across the codebase:"
rg -A 5 "cache_directory" --type python

# Check for HDF5 file operations that might use cache_directory
echo -e "\nChecking HDF5 related operations:"
rg "h5py\.(File|Group)" --type python

# Check for any queue execution related functions
echo -e "\nChecking queue execution related functions:"
rg -A 5 "(def.*execute.*queue|future_queue)" --type python

Length of output: 552

Script:

#!/bin/bash
# Let's try without specifying file type

# Check for cache_directory parameter usage in the entire codebase
echo "Checking cache_directory parameter usage across the codebase:"
rg -A 5 "cache_directory"

# Check for HDF5 file operations that might use cache_directory
echo -e "\nChecking HDF5 related operations:"
rg "h5py\.(File|Group)"

# Check for any queue execution related functions
echo -e "\nChecking queue execution related functions:"
rg -A 5 "(def.*execute.*queue|future_queue)"

Length of output: 53682

coderabbitai · 2024-11-08T21:44:06Z

executorlib/standalone/cache/queue.py

+        resource_dict = {"cwd": cache_directory}
+    elif len(resource_dict) == 0:
+        resource_dict = {"cwd": cache_directory}
+    elif "cwd" in resource_dict and resource_dict["cwd"] is None:
+        resource_dict["cwd"] = cache_directory


💡 Codebase verification

Refactoring suggestion needs revision due to cache_directory being optional

The refactoring suggestion needs to be reconsidered since cache_directory is defined as Optional[str] in the function signature, but there's no validation of its value before use. The current code could potentially set None as the cwd value, which might cause issues downstream.

Instead of simplifying, the code should handle the None case:

if cache_directory is not None: if resource_dict is None or len(resource_dict) == 0: resource_dict = {"cwd": cache_directory} elif "cwd" in resource_dict and resource_dict["cwd"] is None: resource_dict["cwd"] = cache_directory

The function signature shows cache_directory: Optional[str] = None

The docstring indicates it as cache_directory (str) without mentioning optional

No validation exists for cache_directory being None before using it

🔗 Analysis chain

Simplify conditional logic and verify None handling

The current implementation can be simplified by combining the conditions.

- if resource_dict is None: - resource_dict = {"cwd": cache_directory} - elif len(resource_dict) == 0: - resource_dict = {"cwd": cache_directory} - elif "cwd" in resource_dict and resource_dict["cwd"] is None: - resource_dict["cwd"] = cache_directory + if resource_dict is None or len(resource_dict) == 0: + resource_dict = {"cwd": cache_directory} + elif "cwd" in resource_dict and resource_dict["cwd"] is None: + resource_dict["cwd"] = cache_directory

Additionally, we should verify what happens when both cache_directory and resource_dict["cwd"] are None.

Let's analyze how cache_directory is handled in the queue.py file to understand its validation and potential None cases.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Search for error handling or validation of cache_directory rg -A 5 "cache_directory.*None" .

Length of output: 101

Script:

#!/bin/bash # Search for cache_directory parameter and its usage rg -A 10 'def.*cache_directory' executorlib/standalone/cache/queue.py # Search for cache_directory validation or initialization rg -B 5 'cache_directory' executorlib/standalone/cache/queue.py # Search for any None checks or error handling related to cache_directory ast-grep --pattern 'if $_ is None: $$$' executorlib/standalone/cache/queue.py

Length of output: 1628

…directory

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)

executorlib/standalone/cache/spawner.py (2)
12-12: Improve docstring for better clarity.

The docstring should be enhanced to better describe the cache_directory parameter:
-        cache_directory (str): The directory to store the HDF5 files.
+        cache_directory (str, optional): Path to the directory where HDF5 files will be stored. 
+            If provided and no working directory is specified in resource_dict, 
+            this will be used as the subprocess working directory.
Also applies to: 26-26

43-48: Simplify working directory assignment logic.

The current if-else block can be simplified using Python's dict.get() method.
-    if resource_dict is None:
-        resource_dict = {}
-    if "cwd" in resource_dict:
-        cwd = resource_dict["cwd"]
-    else:
-        cwd = cache_directory
+    resource_dict = resource_dict or {}
+    cwd = resource_dict.get("cwd", cache_directory)
🧰 Tools

🪛 Ruff

44-47: Use ternary operator cwd = resource_dict["cwd"] if "cwd" in resource_dict else cache_directory instead of if-else-block

Replace if-else-block with cwd = resource_dict["cwd"] if "cwd" in resource_dict else cache_directory

(SIM108)

44-47: Use cwd = resource_dict.get("cwd", cache_directory) instead of an if block

Replace with cwd = resource_dict.get("cwd", cache_directory)

(SIM401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9afcb56 and 26305a8.

📒 Files selected for processing (2)

executorlib/standalone/cache/queue.py (3 hunks)
executorlib/standalone/cache/spawner.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

executorlib/standalone/cache/queue.py

🧰 Additional context used

🪛 Ruff

executorlib/standalone/cache/spawner.py

44-47: Use ternary operator cwd = resource_dict["cwd"] if "cwd" in resource_dict else cache_directory instead of if-else-block

Replace if-else-block with cwd = resource_dict["cwd"] if "cwd" in resource_dict else cache_directory

(SIM108)

44-47: Use cwd = resource_dict.get("cwd", cache_directory) instead of an if block

Replace with cwd = resource_dict.get("cwd", cache_directory)

(SIM401)

🔇 Additional comments (1)

executorlib/standalone/cache/spawner.py (1)

Line range hint 12-48: Overall implementation looks good!

The changes effectively address the working directory configuration by introducing the cache_directory parameter while maintaining backward compatibility through the existing resource_dict mechanism.

🧰 Tools

🪛 Ruff

44-47: Use ternary operator cwd = resource_dict["cwd"] if "cwd" in resource_dict else cache_directory instead of if-else-block

Replace if-else-block with cwd = resource_dict["cwd"] if "cwd" in resource_dict else cache_directory

(SIM108)

44-47: Use cwd = resource_dict.get("cwd", cache_directory) instead of an if block

Replace with cwd = resource_dict.get("cwd", cache_directory)

(SIM401)

coderabbitai · 2024-11-08T22:18:20Z

executorlib/standalone/cache/spawner.py

-    elif len(resource_dict) == 0:
-        resource_dict = {"cwd": None}
-    return subprocess.Popen(command, universal_newlines=True, cwd=resource_dict["cwd"])
+        resource_dict = {}


💡 Codebase verification

Based on the search results, I can generate the final response as we have found clear evidence of code that relies on the "cwd" key in resource_dict:

The change to empty resource_dict initialization will cause runtime errors

The empty dictionary initialization will break existing code in multiple places:

executorlib/standalone/cache/spawner.py directly accesses resource_dict["cwd"] without checking existence

executorlib/standalone/cache/queue.py has logic that depends on "cwd" key being present, including:

Checking if "cwd" exists and is not None

Accessing the "cwd" value

Explicitly deleting the "cwd" key

The original initialization with {"cwd": None} was intentional and should be preserved to maintain compatibility with the existing code.

🔗 Analysis chain

Verify the impact of resource_dict initialization change.

The change from initializing with {"cwd": None} to an empty dictionary might affect code that expects the "cwd" key to always exist.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Search for code that might rely on "cwd" key existence rg -A 3 'resource_dict.*cwd' --type py

Length of output: 2523

Fix working directory

9c0fc89

[pre-commit.ci] auto fixes from pre-commit.com hooks

9afcb56

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Nov 8, 2024

View reviewed changes

jan-janssen added 5 commits November 8, 2024 22:44

shorter fix

267a8a6

Merge remote-tracking branch 'origin/working_directory' into working_…

9da1fed

…directory

even shorted

317397f

Merge remote-tracking branch 'refs/remotes/origin/main' into working_…

efe3302

…directory

another fix

2c34f22

jan-janssen force-pushed the working_directory branch from b8de525 to 2c34f22 Compare November 8, 2024 21:55

windows fix

26305a8

coderabbitai bot reviewed Nov 8, 2024

View reviewed changes

jan-janssen merged commit d31907f into main Nov 8, 2024
25 of 26 checks passed

jan-janssen deleted the working_directory branch November 8, 2024 22:19

This was referenced Nov 10, 2024

Empty cache #491

Closed

Write cache first #492

Merged

Queuing system submission: check if the job is already waiting in the queue or currently running. #499

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix working directory #481

Fix working directory #481

Uh oh!

jan-janssen commented Nov 8, 2024 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 8, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 8, 2024

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 8, 2024

Uh oh!

Uh oh!

Uh oh!

Fix working directory #481

Fix working directory #481

Uh oh!

Conversation

jan-janssen commented Nov 8, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 8, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 8, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jan-janssen commented Nov 8, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 8, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)