Skip to content

Cache: Add working directory parameter #446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 27, 2024
Merged

Conversation

jan-janssen
Copy link
Member

@jan-janssen jan-janssen commented Oct 27, 2024

Summary by CodeRabbit

  • New Features

    • Introduced an optional parameter to specify the current working directory for executing tasks, enhancing flexibility in task execution.
    • Added a new test to verify the functionality of the FileExecutor with a specified working directory.
  • Bug Fixes

    • Improved subprocess execution by allowing the specification of a working directory, which may resolve issues related to file paths and dependencies.
  • Documentation

    • Updated docstrings to clarify the purpose and optional nature of new parameters.

Copy link
Contributor

coderabbitai bot commented Oct 27, 2024

Walkthrough

The pull request introduces a new optional parameter cwd to the FileExecutor class's __init__ method and related subprocess functions, allowing users to specify the current working directory for executing tasks. This change is reflected in the executor.py and shared.py files, where the signatures of several functions are updated to include cwd. Additionally, a new test function is added to verify the functionality of the FileExecutor with a specified working directory in the test suite.

Changes

File Change Summary
executorlib/cache/executor.py Added optional parameter cwd to __init__ method of FileExecutor class; updated docstring and passed cwd to RaisingThread in _set_process method.
executorlib/cache/shared.py Updated execute_in_subprocess and execute_tasks_h5 functions to include optional cwd parameter; modified docstrings and added import for Optional.
tests/test_cache_executor_serial.py Added function list_files_in_working_directory and test method test_executor_working_directory to verify FileExecutor functionality with a specified working directory.

Possibly related PRs

  • Split shared cache in backend and frontend #443: The changes in executorlib/shared/cache.py regarding the removal of functions related to task execution and file handling may indirectly relate to the changes in executorlib/cache/shared.py, where the cwd parameter is introduced for subprocess execution. Both PRs involve modifications to how tasks are executed, although they focus on different aspects of the execution process.

🐇 In a world of tasks and threads,
A new path opens, where the rabbit treads.
With cwd to guide each little hop,
In directories new, we’ll never stop!
So let’s celebrate with a joyful cheer,
For flexible paths, we hold so dear! 🎉


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)
executorlib/cache/executor.py (1)

24-24: Enhance parameter documentation

While the documentation is functional, it could be more precise and consistent with the type hints.

Consider updating the documentation to:

-            cwd (str/None): current working directory where the parallel python task is executed
+            cwd (Optional[str]): The working directory where parallel Python tasks are executed. If None, uses the current working directory.
tests/test_cache_executor_serial.py (1)

22-24: Add docstring to helper function.

Consider adding a docstring to clarify the function's purpose and return value.

 def list_files_in_working_directory():
+    """Return a list of files in the current working directory.
+
+    Returns:
+        list[str]: Names of files and directories in the current working directory.
+    """
     return os.listdir(os.getcwd())
executorlib/cache/shared.py (1)

79-79: Consider making cwd parameter optional for consistency

The cwd parameter is optional in execute_in_subprocess but required in execute_tasks_h5. For API consistency, consider making it optional here as well.

def execute_tasks_h5(
    future_queue: queue.Queue,
    cache_directory: str,
    cores_per_worker: int,
    execute_function: callable,
-   cwd: Optional[str],
+   cwd: Optional[str] = None,
) -> None:

Also applies to: 89-89

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 451cc92 and 6f9b59e.

📒 Files selected for processing (3)
  • executorlib/cache/executor.py (4 hunks)
  • executorlib/cache/shared.py (5 hunks)
  • tests/test_cache_executor_serial.py (2 hunks)
🧰 Additional context used
🪛 Ruff
executorlib/cache/shared.py

52-52: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🔇 Additional comments (6)
executorlib/cache/executor.py (2)

2-2: LGTM: Clean import addition

The Optional type import is appropriately added for the new cwd parameter type hint.


15-15: LGTM: Well-designed parameter addition

The new cwd parameter is correctly typed and maintains backward compatibility with existing code through its optional nature.

Let's verify that existing callers won't be affected:

✅ Verification successful

Verified: Parameter addition is safe and already in use

The codebase scan shows that all existing FileExecutor instantiations remain compatible:

  • Most usages don't specify cwd, utilizing the default None value
  • One test case already uses the new cwd parameter correctly in test_cache_executor_serial.py
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for existing FileExecutor instantiations to ensure they remain compatible
rg -l "FileExecutor\(" | xargs rg -A 5 "FileExecutor\("

Length of output: 2239

tests/test_cache_executor_serial.py (1)

45-50: 🛠️ Refactor suggestion

Enhance test robustness and documentation.

The test would benefit from:

  1. A docstring explaining the test's purpose
  2. Verification that the test directory exists
  3. Additional test cases for edge cases
     def test_executor_working_directory(self):
+        """Test FileExecutor respects the specified working directory.
+
+        Verifies that tasks executed through FileExecutor run in the specified
+        working directory instead of the default directory.
+        """
         cwd = os.path.join(os.path.dirname(__file__), "executables")
+        if not os.path.exists(cwd):
+            self.skipTest(f"Test directory {cwd!r} does not exist")
         with FileExecutor(cwd=cwd) as exe:
             fs1 = exe.submit(list_files_in_working_directory)
             self.assertEqual(fs1.result(), os.listdir(cwd))

Let's verify if the test directory exists:

Consider adding these test cases:

  1. Test with non-existent directory
  2. Test with relative path
  3. Test with normalized vs non-normalized paths

Example:

def test_executor_working_directory_not_exists(self):
    """Test FileExecutor raises appropriate error for non-existent directory."""
    with self.assertRaises(FileNotFoundError):
        with FileExecutor(cwd="/non/existent/path"):
            pass
executorlib/cache/shared.py (3)

7-7: LGTM: Clean import addition

The Optional type import is appropriately added for the new cwd parameter type hints.


71-71: LGTM: Correct subprocess working directory implementation

The cwd parameter is correctly passed to subprocess.Popen, maintaining compatibility with the standard Python subprocess behavior.


131-131: LGTM: Correct parameter forwarding

The cwd parameter is correctly passed through to the execute_function call.

@@ -31,6 +34,7 @@ def __init__(
"execute_function": execute_function,
"cache_directory": cache_directory_path,
"cores_per_worker": cores_per_worker,
"cwd": cwd,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding working directory validation

While the implementation is correct, it might be helpful to validate the working directory exists before passing it to the execution context to fail fast with a clear error message.

Consider adding validation in __init__:

if cwd is not None and not os.path.isdir(cwd):
    raise ValueError(f"Working directory does not exist: {cwd}")

Comment on lines +51 to +53
command: list,
task_dependent_lst: list = [],
cwd: Optional[str] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix mutable default argument

Using mutable default arguments in Python can lead to unexpected behavior as the list is created once when the function is defined, not each time it's called.

Replace with:

def execute_in_subprocess(
    command: list,
-   task_dependent_lst: list = [],
+   task_dependent_lst: list = None,
    cwd: Optional[str] = None,
) -> subprocess.Popen:

And add at the beginning of the function:

    if task_dependent_lst is None:
        task_dependent_lst = []
🧰 Tools
🪛 Ruff

52-52: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
tests/test_cache_executor_serial.py (2)

22-24: Add error handling for directory listing.

While the function works for the happy path, it should handle potential OS errors that could occur during directory listing.

Consider adding error handling:

def list_files_in_working_directory():
+    try:
        return os.listdir(os.getcwd())
+    except OSError as e:
+        raise RuntimeError(f"Failed to list working directory: {e}")

45-50: Enhance test robustness and cleanup.

The test verifies the basic functionality but could be more robust.

Consider these improvements:

def test_executor_working_directory(self):
    cwd = os.path.join(os.path.dirname(__file__), "executables")
+    if not os.path.exists(cwd):
+        self.skipTest("Executables directory not found")
    with FileExecutor(cwd=cwd) as exe:
        fs1 = exe.submit(list_files_in_working_directory)
        self.assertEqual(fs1.result(), os.listdir(cwd))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 6f9b59e and 75f9c18.

📒 Files selected for processing (1)
  • tests/test_cache_executor_serial.py (5 hunks)
🔇 Additional comments (1)
tests/test_cache_executor_serial.py (1)

64-64: LGTM! Consistent handling of the new cwd parameter.

The addition of cwd=None is consistent across all test methods and aligns with the new parameter's default behavior.

Also applies to: 89-89, 114-114

Comment on lines +45 to +50
def test_executor_working_directory(self):
cwd = os.path.join(os.path.dirname(__file__), "executables")
with FileExecutor(cwd=cwd) as exe:
fs1 = exe.submit(list_files_in_working_directory)
self.assertEqual(fs1.result(), os.listdir(cwd))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

🛠️ Refactor suggestion

Add test cases for error scenarios with cwd parameter

The review comment is valid. The implementation accepts a cwd parameter without any validation, and there's only one positive test case. Adding test cases for error scenarios is important since:

  1. The parameter directly affects subprocess execution environment
  2. No validation exists in the implementation
  3. Current test coverage only includes the happy path
  4. Working directory issues can lead to runtime failures

The suggested test cases in the review comment appropriately cover:

  • Non-existent directories
  • Invalid paths
  • Permission issues
🔗 Analysis chain

Add test cases for error scenarios.

The current test only covers the happy path. Consider adding test cases for:

  1. Invalid directory paths
  2. Non-existent directories
  3. Directories without read permissions

Here's a suggested additional test:

def test_executor_working_directory_errors(self):
    # Test non-existent directory
    with self.assertRaises(ValueError):
        with FileExecutor(cwd="/nonexistent/path"):
            pass

    # Test invalid directory path
    with self.assertRaises(ValueError):
        with FileExecutor(cwd=""):
            pass

    # Test directory without permissions (Unix-like systems only)
    if os.name != 'nt':
        no_access_dir = os.path.join(self.test_dir, "no_access")
        os.makedirs(no_access_dir, mode=0o000)
        try:
            with self.assertRaises(PermissionError):
                with FileExecutor(cwd=no_access_dir):
                    pass
        finally:
            os.chmod(no_access_dir, 0o755)  # Restore permissions for cleanup
            os.rmdir(no_access_dir)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if there are any existing error handling tests for cwd parameter
rg -l "test.*working.*directory.*error" tests/
rg -l "test.*invalid.*directory" tests/

Length of output: 90


Script:

#!/bin/bash
# Let's check for any existing error handling tests for FileExecutor initialization
rg -A 10 "FileExecutor.*cwd" tests/

Length of output: 891


Script:

#!/bin/bash
# Let's check the FileExecutor implementation to understand error handling
rg -A 10 "class FileExecutor" .

Length of output: 678


Script:

#!/bin/bash
# Let's check the implementation details of FileExecutor to see if there's any validation
rg -B 5 -A 15 "def __init__" ./executorlib/cache/executor.py

Length of output: 929

@jan-janssen jan-janssen merged commit df966af into main Oct 27, 2024
24 checks passed
@jan-janssen jan-janssen deleted the cache_working_directory branch October 27, 2024 17:45
@coderabbitai coderabbitai bot mentioned this pull request Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant