Cache: Add working directory parameter #446

jan-janssen · 2024-10-27T17:19:15Z

Summary by CodeRabbit

New Features
- Introduced an optional parameter to specify the current working directory for executing tasks, enhancing flexibility in task execution.
- Added a new test to verify the functionality of the FileExecutor with a specified working directory.
Bug Fixes
- Improved subprocess execution by allowing the specification of a working directory, which may resolve issues related to file paths and dependencies.
Documentation
- Updated docstrings to clarify the purpose and optional nature of new parameters.

for more information, see https://pre-commit.ci

coderabbitai · 2024-10-27T17:19:21Z

Walkthrough

The pull request introduces a new optional parameter cwd to the FileExecutor class's __init__ method and related subprocess functions, allowing users to specify the current working directory for executing tasks. This change is reflected in the executor.py and shared.py files, where the signatures of several functions are updated to include cwd. Additionally, a new test function is added to verify the functionality of the FileExecutor with a specified working directory in the test suite.

Changes

File	Change Summary
executorlib/cache/executor.py	Added optional parameter `cwd` to `__init__` method of `FileExecutor` class; updated docstring and passed `cwd` to `RaisingThread` in `_set_process` method.
executorlib/cache/shared.py	Updated `execute_in_subprocess` and `execute_tasks_h5` functions to include optional `cwd` parameter; modified docstrings and added import for `Optional`.
tests/test_cache_executor_serial.py	Added function `list_files_in_working_directory` and test method `test_executor_working_directory` to verify `FileExecutor` functionality with a specified working directory.

Possibly related PRs

Split shared cache in backend and frontend #443: The changes in executorlib/shared/cache.py regarding the removal of functions related to task execution and file handling may indirectly relate to the changes in executorlib/cache/shared.py, where the cwd parameter is introduced for subprocess execution. Both PRs involve modifications to how tasks are executed, although they focus on different aspects of the execution process.

🐇 In a world of tasks and threads,
A new path opens, where the rabbit treads.
With cwd to guide each little hop,
In directories new, we’ll never stop!
So let’s celebrate with a joyful cheer,
For flexible paths, we hold so dear! 🎉

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)

executorlib/cache/executor.py (1)
24-24: Enhance parameter documentation

While the documentation is functional, it could be more precise and consistent with the type hints.

Consider updating the documentation to:
-            cwd (str/None): current working directory where the parallel python task is executed
+            cwd (Optional[str]): The working directory where parallel Python tasks are executed. If None, uses the current working directory.
tests/test_cache_executor_serial.py (1)
22-24: Add docstring to helper function.

Consider adding a docstring to clarify the function's purpose and return value.
 def list_files_in_working_directory():
+    """Return a list of files in the current working directory.
+
+    Returns:
+        list[str]: Names of files and directories in the current working directory.
+    """
     return os.listdir(os.getcwd())
executorlib/cache/shared.py (1)
79-79: Consider making cwd parameter optional for consistency

The cwd parameter is optional in execute_in_subprocess but required in execute_tasks_h5. For API consistency, consider making it optional here as well.
def execute_tasks_h5(
    future_queue: queue.Queue,
    cache_directory: str,
    cores_per_worker: int,
    execute_function: callable,
-   cwd: Optional[str],
+   cwd: Optional[str] = None,
) -> None:
Also applies to: 89-89

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 451cc92 and 6f9b59e.

📒 Files selected for processing (3)

executorlib/cache/executor.py (4 hunks)
executorlib/cache/shared.py (5 hunks)
tests/test_cache_executor_serial.py (2 hunks)

🧰 Additional context used

🪛 Ruff

executorlib/cache/shared.py

52-52: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🔇 Additional comments (6)

executorlib/cache/executor.py (2)
2-2: LGTM: Clean import addition

The Optional type import is appropriately added for the new cwd parameter type hint.

15-15: LGTM: Well-designed parameter addition

The new cwd parameter is correctly typed and maintains backward compatibility with existing code through its optional nature.

Let's verify that existing callers won't be affected:

✅ Verification successful

Verified: Parameter addition is safe and already in use

The codebase scan shows that all existing FileExecutor instantiations remain compatible:

Most usages don't specify cwd, utilizing the default None value

One test case already uses the new cwd parameter correctly in test_cache_executor_serial.py
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for existing FileExecutor instantiations to ensure they remain compatible
rg -l "FileExecutor\(" | xargs rg -A 5 "FileExecutor\("
Length of output: 2239
tests/test_cache_executor_serial.py (1)
45-50: 🛠️ Refactor suggestion

Enhance test robustness and documentation.

The test would benefit from:

A docstring explaining the test's purpose

Verification that the test directory exists

Additional test cases for edge cases
     def test_executor_working_directory(self):
+        """Test FileExecutor respects the specified working directory.
+
+        Verifies that tasks executed through FileExecutor run in the specified
+        working directory instead of the default directory.
+        """
         cwd = os.path.join(os.path.dirname(__file__), "executables")
+        if not os.path.exists(cwd):
+            self.skipTest(f"Test directory {cwd!r} does not exist")
         with FileExecutor(cwd=cwd) as exe:
             fs1 = exe.submit(list_files_in_working_directory)
             self.assertEqual(fs1.result(), os.listdir(cwd))
Let's verify if the test directory exists:

Consider adding these test cases:

Test with non-existent directory

Test with relative path

Test with normalized vs non-normalized paths

Example:
def test_executor_working_directory_not_exists(self):
    """Test FileExecutor raises appropriate error for non-existent directory."""
    with self.assertRaises(FileNotFoundError):
        with FileExecutor(cwd="/non/existent/path"):
            pass
executorlib/cache/shared.py (3)

7-7: LGTM: Clean import addition

The Optional type import is appropriately added for the new cwd parameter type hints.

71-71: LGTM: Correct subprocess working directory implementation

The cwd parameter is correctly passed to subprocess.Popen, maintaining compatibility with the standard Python subprocess behavior.

131-131: LGTM: Correct parameter forwarding

The cwd parameter is correctly passed through to the execute_function call.

coderabbitai · 2024-10-27T17:24:53Z

executorlib/cache/executor.py

@@ -31,6 +34,7 @@ def __init__(
                    "execute_function": execute_function,
                    "cache_directory": cache_directory_path,
                    "cores_per_worker": cores_per_worker,
+                    "cwd": cwd,


🛠️ Refactor suggestion

Consider adding working directory validation

While the implementation is correct, it might be helpful to validate the working directory exists before passing it to the execution context to fail fast with a clear error message.

Consider adding validation in __init__:

if cwd is not None and not os.path.isdir(cwd): raise ValueError(f"Working directory does not exist: {cwd}")

coderabbitai · 2024-10-27T17:24:53Z

executorlib/cache/shared.py

+    command: list,
+    task_dependent_lst: list = [],
+    cwd: Optional[str] = None,


⚠️ Potential issue

Fix mutable default argument

Using mutable default arguments in Python can lead to unexpected behavior as the list is created once when the function is defined, not each time it's called.

Replace with:

def execute_in_subprocess( command: list, - task_dependent_lst: list = [], + task_dependent_lst: list = None, cwd: Optional[str] = None, ) -> subprocess.Popen:

And add at the beginning of the function:

if task_dependent_lst is None: task_dependent_lst = []

🧰 Tools

🪛 Ruff

52-52: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

…che_working_directory

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)

tests/test_cache_executor_serial.py (2)
22-24: Add error handling for directory listing.

While the function works for the happy path, it should handle potential OS errors that could occur during directory listing.

Consider adding error handling:
def list_files_in_working_directory():
+    try:
        return os.listdir(os.getcwd())
+    except OSError as e:
+        raise RuntimeError(f"Failed to list working directory: {e}")
45-50: Enhance test robustness and cleanup.

The test verifies the basic functionality but could be more robust.

Consider these improvements:
def test_executor_working_directory(self):
    cwd = os.path.join(os.path.dirname(__file__), "executables")
+    if not os.path.exists(cwd):
+        self.skipTest("Executables directory not found")
    with FileExecutor(cwd=cwd) as exe:
        fs1 = exe.submit(list_files_in_working_directory)
        self.assertEqual(fs1.result(), os.listdir(cwd))

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 6f9b59e and 75f9c18.

📒 Files selected for processing (1)

tests/test_cache_executor_serial.py (5 hunks)

🔇 Additional comments (1)

tests/test_cache_executor_serial.py (1)

64-64: LGTM! Consistent handling of the new cwd parameter.

The addition of cwd=None is consistent across all test methods and aligns with the new parameter's default behavior.

Also applies to: 89-89, 114-114

coderabbitai · 2024-10-27T17:33:15Z

tests/test_cache_executor_serial.py

+    def test_executor_working_directory(self):
+        cwd = os.path.join(os.path.dirname(__file__), "executables")
+        with FileExecutor(cwd=cwd) as exe:
+            fs1 = exe.submit(list_files_in_working_directory)
+            self.assertEqual(fs1.result(), os.listdir(cwd))
+


💡 Codebase verification

🛠️ Refactor suggestion

Add test cases for error scenarios with cwd parameter

The review comment is valid. The implementation accepts a cwd parameter without any validation, and there's only one positive test case. Adding test cases for error scenarios is important since:

The parameter directly affects subprocess execution environment

No validation exists in the implementation

Current test coverage only includes the happy path

Working directory issues can lead to runtime failures

The suggested test cases in the review comment appropriately cover:

Non-existent directories

Invalid paths

Permission issues

🔗 Analysis chain

Add test cases for error scenarios.

The current test only covers the happy path. Consider adding test cases for:

Invalid directory paths

Non-existent directories

Directories without read permissions

Here's a suggested additional test:

def test_executor_working_directory_errors(self): # Test non-existent directory with self.assertRaises(ValueError): with FileExecutor(cwd="/nonexistent/path"): pass # Test invalid directory path with self.assertRaises(ValueError): with FileExecutor(cwd=""): pass # Test directory without permissions (Unix-like systems only) if os.name != 'nt': no_access_dir = os.path.join(self.test_dir, "no_access") os.makedirs(no_access_dir, mode=0o000) try: with self.assertRaises(PermissionError): with FileExecutor(cwd=no_access_dir): pass finally: os.chmod(no_access_dir, 0o755) # Restore permissions for cleanup os.rmdir(no_access_dir)

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Check if there are any existing error handling tests for cwd parameter rg -l "test.*working.*directory.*error" tests/ rg -l "test.*invalid.*directory" tests/

Length of output: 90

Script:

#!/bin/bash # Let's check for any existing error handling tests for FileExecutor initialization rg -A 10 "FileExecutor.*cwd" tests/

Length of output: 891

Script:

#!/bin/bash # Let's check the FileExecutor implementation to understand error handling rg -A 10 "class FileExecutor" .

Length of output: 678

Script:

#!/bin/bash # Let's check the implementation details of FileExecutor to see if there's any validation rg -B 5 -A 15 "def __init__" ./executorlib/cache/executor.py

Length of output: 929

jan-janssen and others added 2 commits October 27, 2024 18:18

Cache: Add working directory parameter

0f59f57

[pre-commit.ci] auto fixes from pre-commit.com hooks

6f9b59e

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Oct 27, 2024

View reviewed changes

jan-janssen added 2 commits October 27, 2024 18:29

fix tests

e19d4b3

Merge remote-tracking branch 'origin/cache_working_directory' into ca…

75f9c18

…che_working_directory

coderabbitai bot reviewed Oct 27, 2024

View reviewed changes

jan-janssen merged commit df966af into main Oct 27, 2024
24 checks passed

jan-janssen deleted the cache_working_directory branch October 27, 2024 17:45

This was referenced Oct 27, 2024

Cache: Terminate processes when closing executor #447

Merged

Cache: Implement resource dict for submit() #451

Merged

Executor user interface #458

Closed

Cache: create method #459

Merged

Cache: Use absolute path for working directory #462

Merged

coderabbitai bot mentioned this pull request Nov 8, 2024

Fix working directory #481

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cache: Add working directory parameter #446

Cache: Add working directory parameter #446

Uh oh!

jan-janssen commented Oct 27, 2024 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 27, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 27, 2024

Uh oh!

coderabbitai bot Oct 27, 2024

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 27, 2024

Uh oh!

Uh oh!

Uh oh!

Cache: Add working directory parameter #446

Cache: Add working directory parameter #446

Uh oh!

Conversation

jan-janssen commented Oct 27, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 27, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 27, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 27, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jan-janssen commented Oct 27, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 27, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)