-
Notifications
You must be signed in to change notification settings - Fork 4
Cache: Add working directory parameter #446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe pull request introduces a new optional parameter Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (3)
executorlib/cache/executor.py (1)
24-24
: Enhance parameter documentationWhile the documentation is functional, it could be more precise and consistent with the type hints.
Consider updating the documentation to:
- cwd (str/None): current working directory where the parallel python task is executed + cwd (Optional[str]): The working directory where parallel Python tasks are executed. If None, uses the current working directory.tests/test_cache_executor_serial.py (1)
22-24
: Add docstring to helper function.Consider adding a docstring to clarify the function's purpose and return value.
def list_files_in_working_directory(): + """Return a list of files in the current working directory. + + Returns: + list[str]: Names of files and directories in the current working directory. + """ return os.listdir(os.getcwd())executorlib/cache/shared.py (1)
79-79
: Consider making cwd parameter optional for consistencyThe cwd parameter is optional in
execute_in_subprocess
but required inexecute_tasks_h5
. For API consistency, consider making it optional here as well.def execute_tasks_h5( future_queue: queue.Queue, cache_directory: str, cores_per_worker: int, execute_function: callable, - cwd: Optional[str], + cwd: Optional[str] = None, ) -> None:Also applies to: 89-89
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- executorlib/cache/executor.py (4 hunks)
- executorlib/cache/shared.py (5 hunks)
- tests/test_cache_executor_serial.py (2 hunks)
🧰 Additional context used
🪛 Ruff
executorlib/cache/shared.py
52-52: Do not use mutable data structures for argument defaults
Replace with
None
; initialize within function(B006)
🔇 Additional comments (6)
executorlib/cache/executor.py (2)
2-2
: LGTM: Clean import additionThe Optional type import is appropriately added for the new cwd parameter type hint.
15-15
: LGTM: Well-designed parameter additionThe new
cwd
parameter is correctly typed and maintains backward compatibility with existing code through its optional nature.Let's verify that existing callers won't be affected:
✅ Verification successful
Verified: Parameter addition is safe and already in use
The codebase scan shows that all existing
FileExecutor
instantiations remain compatible:
- Most usages don't specify
cwd
, utilizing the defaultNone
value- One test case already uses the new
cwd
parameter correctly intest_cache_executor_serial.py
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for existing FileExecutor instantiations to ensure they remain compatible rg -l "FileExecutor\(" | xargs rg -A 5 "FileExecutor\("Length of output: 2239
tests/test_cache_executor_serial.py (1)
45-50
: 🛠️ Refactor suggestionEnhance test robustness and documentation.
The test would benefit from:
- A docstring explaining the test's purpose
- Verification that the test directory exists
- Additional test cases for edge cases
def test_executor_working_directory(self): + """Test FileExecutor respects the specified working directory. + + Verifies that tasks executed through FileExecutor run in the specified + working directory instead of the default directory. + """ cwd = os.path.join(os.path.dirname(__file__), "executables") + if not os.path.exists(cwd): + self.skipTest(f"Test directory {cwd!r} does not exist") with FileExecutor(cwd=cwd) as exe: fs1 = exe.submit(list_files_in_working_directory) self.assertEqual(fs1.result(), os.listdir(cwd))Let's verify if the test directory exists:
Consider adding these test cases:
- Test with non-existent directory
- Test with relative path
- Test with normalized vs non-normalized paths
Example:
def test_executor_working_directory_not_exists(self): """Test FileExecutor raises appropriate error for non-existent directory.""" with self.assertRaises(FileNotFoundError): with FileExecutor(cwd="/non/existent/path"): passexecutorlib/cache/shared.py (3)
7-7
: LGTM: Clean import additionThe Optional type import is appropriately added for the new cwd parameter type hints.
71-71
: LGTM: Correct subprocess working directory implementationThe cwd parameter is correctly passed to subprocess.Popen, maintaining compatibility with the standard Python subprocess behavior.
131-131
: LGTM: Correct parameter forwardingThe cwd parameter is correctly passed through to the execute_function call.
@@ -31,6 +34,7 @@ def __init__( | |||
"execute_function": execute_function, | |||
"cache_directory": cache_directory_path, | |||
"cores_per_worker": cores_per_worker, | |||
"cwd": cwd, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding working directory validation
While the implementation is correct, it might be helpful to validate the working directory exists before passing it to the execution context to fail fast with a clear error message.
Consider adding validation in __init__
:
if cwd is not None and not os.path.isdir(cwd):
raise ValueError(f"Working directory does not exist: {cwd}")
command: list, | ||
task_dependent_lst: list = [], | ||
cwd: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix mutable default argument
Using mutable default arguments in Python can lead to unexpected behavior as the list is created once when the function is defined, not each time it's called.
Replace with:
def execute_in_subprocess(
command: list,
- task_dependent_lst: list = [],
+ task_dependent_lst: list = None,
cwd: Optional[str] = None,
) -> subprocess.Popen:
And add at the beginning of the function:
if task_dependent_lst is None:
task_dependent_lst = []
🧰 Tools
🪛 Ruff
52-52: Do not use mutable data structures for argument defaults
Replace with
None
; initialize within function(B006)
…che_working_directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
tests/test_cache_executor_serial.py (2)
22-24
: Add error handling for directory listing.While the function works for the happy path, it should handle potential OS errors that could occur during directory listing.
Consider adding error handling:
def list_files_in_working_directory(): + try: return os.listdir(os.getcwd()) + except OSError as e: + raise RuntimeError(f"Failed to list working directory: {e}")
45-50
: Enhance test robustness and cleanup.The test verifies the basic functionality but could be more robust.
Consider these improvements:
def test_executor_working_directory(self): cwd = os.path.join(os.path.dirname(__file__), "executables") + if not os.path.exists(cwd): + self.skipTest("Executables directory not found") with FileExecutor(cwd=cwd) as exe: fs1 = exe.submit(list_files_in_working_directory) self.assertEqual(fs1.result(), os.listdir(cwd))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- tests/test_cache_executor_serial.py (5 hunks)
🔇 Additional comments (1)
tests/test_cache_executor_serial.py (1)
64-64
: LGTM! Consistent handling of the new cwd parameter.The addition of
cwd=None
is consistent across all test methods and aligns with the new parameter's default behavior.Also applies to: 89-89, 114-114
def test_executor_working_directory(self): | ||
cwd = os.path.join(os.path.dirname(__file__), "executables") | ||
with FileExecutor(cwd=cwd) as exe: | ||
fs1 = exe.submit(list_files_in_working_directory) | ||
self.assertEqual(fs1.result(), os.listdir(cwd)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
🛠️ Refactor suggestion
Add test cases for error scenarios with cwd parameter
The review comment is valid. The implementation accepts a cwd
parameter without any validation, and there's only one positive test case. Adding test cases for error scenarios is important since:
- The parameter directly affects subprocess execution environment
- No validation exists in the implementation
- Current test coverage only includes the happy path
- Working directory issues can lead to runtime failures
The suggested test cases in the review comment appropriately cover:
- Non-existent directories
- Invalid paths
- Permission issues
🔗 Analysis chain
Add test cases for error scenarios.
The current test only covers the happy path. Consider adding test cases for:
- Invalid directory paths
- Non-existent directories
- Directories without read permissions
Here's a suggested additional test:
def test_executor_working_directory_errors(self):
# Test non-existent directory
with self.assertRaises(ValueError):
with FileExecutor(cwd="/nonexistent/path"):
pass
# Test invalid directory path
with self.assertRaises(ValueError):
with FileExecutor(cwd=""):
pass
# Test directory without permissions (Unix-like systems only)
if os.name != 'nt':
no_access_dir = os.path.join(self.test_dir, "no_access")
os.makedirs(no_access_dir, mode=0o000)
try:
with self.assertRaises(PermissionError):
with FileExecutor(cwd=no_access_dir):
pass
finally:
os.chmod(no_access_dir, 0o755) # Restore permissions for cleanup
os.rmdir(no_access_dir)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if there are any existing error handling tests for cwd parameter
rg -l "test.*working.*directory.*error" tests/
rg -l "test.*invalid.*directory" tests/
Length of output: 90
Script:
#!/bin/bash
# Let's check for any existing error handling tests for FileExecutor initialization
rg -A 10 "FileExecutor.*cwd" tests/
Length of output: 891
Script:
#!/bin/bash
# Let's check the FileExecutor implementation to understand error handling
rg -A 10 "class FileExecutor" .
Length of output: 678
Script:
#!/bin/bash
# Let's check the implementation details of FileExecutor to see if there's any validation
rg -B 5 -A 15 "def __init__" ./executorlib/cache/executor.py
Length of output: 929
Summary by CodeRabbit
New Features
FileExecutor
with a specified working directory.Bug Fixes
Documentation