-
Notifications
You must be signed in to change notification settings - Fork 4
Fix working directory #481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe pull request introduces a new optional parameter, Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (4)
executorlib/standalone/cache/spawner.py (2)
12-12
: Enhance the cache_directory parameter documentation.The docstring could be more descriptive about:
- Whether the directory must exist or will be created
- What happens if the directory is None (default behavior)
- The expected permissions and requirements for the directory
Also applies to: 26-26
42-45
: Simplify resource dictionary initialization.The two conditions can be combined for better readability.
- if resource_dict is None: - resource_dict = {"cwd": cache_directory} - elif len(resource_dict) == 0: - resource_dict = {"cwd": cache_directory} + if resource_dict is None or len(resource_dict) == 0: + resource_dict = {"cwd": cache_directory}🧰 Tools
🪛 Ruff
42-45: Combine
if
branches using logicalor
operatorCombine
if
branches(SIM114)
executorlib/standalone/cache/queue.py (1)
28-28
: Enhance parameter documentationConsider expanding the documentation for
cache_directory
to include:
- Its relationship with
resource_dict["cwd"]
- Whether the directory must exist
- The purpose of storing HDF5 files there
- cache_directory (str): The directory to store the HDF5 files. + cache_directory (str, optional): Directory to store HDF5 files. If provided, this value + will be used as the working directory when resource_dict is None, empty, or has + cwd=None. The directory should exist before execution.executorlib/cache/shared.py (1)
Line range hint
52-67
: Add validation for cache_directory parameterThe function relies heavily on the cache_directory for file operations but doesn't validate if the directory exists or is writable. This could lead to runtime errors if an invalid directory is provided.
Consider adding validation at the start of the function:
def execute_tasks_h5( future_queue: queue.Queue, cache_directory: str, execute_function: callable, resource_dict: dict, terminate_function: Optional[callable] = None, pysqa_config_directory: Optional[str] = None, backend: Optional[str] = None, disable_dependencies: bool = False, ) -> None: + if not os.path.isdir(cache_directory): + raise ValueError(f"Cache directory does not exist: {cache_directory}") + if not os.access(cache_directory, os.W_OK): + raise ValueError(f"Cache directory is not writable: {cache_directory}") memory_dict, process_dict, file_name_dict = {}, {}, {}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
executorlib/cache/shared.py
(1 hunks)executorlib/standalone/cache/queue.py
(2 hunks)executorlib/standalone/cache/spawner.py
(3 hunks)
🧰 Additional context used
🪛 Ruff
executorlib/standalone/cache/queue.py
33-36: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
executorlib/standalone/cache/spawner.py
42-45: Combine if
branches using logical or
operator
Combine if
branches
(SIM114)
🔇 Additional comments (5)
executorlib/standalone/cache/spawner.py (1)
Line range hint 12-45
: Add validation for cache_directory parameter.
The cache_directory parameter is used directly without validation, which could lead to security issues:
- Directory existence is not verified
- No protection against path traversal attacks
- Directory permissions are not checked
Consider adding validation before using the directory as the working directory for subprocesses.
executorlib/standalone/cache/queue.py (2)
14-14
: LGTM: Clean signature update
The new cache_directory
parameter is properly typed and follows Python conventions.
14-14
: Verify consistent cache_directory handling across files
Since this change affects working directory handling, we should verify that other files mentioned in the PR (executorlib/cache/shared.py
and executorlib/standalone/cache/spawner.py
) handle cache_directory
consistently.
Also applies to: 34-38
✅ Verification successful
Based on the search results, I can now provide a final response:
Cache directory handling is consistent across the codebase
The cache_directory
parameter is handled consistently across all relevant files:
- In
executorlib/standalone/cache/queue.py
andexecutorlib/standalone/cache/spawner.py
, it's used to set the working directory (cwd
) in the resource dictionary - In
executorlib/cache/shared.py
andexecutorlib/cache/executor.py
, it's used for storing HDF5 files - All implementations treat it as an optional parameter with similar documentation patterns
- The parameter is properly propagated through the execution chain
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for cache_directory parameter usage in related files
rg -A 10 "def.*cache_directory" "executorlib/cache/shared.py" "executorlib/standalone/cache/spawner.py"
Length of output: 101
Script:
#!/bin/bash
# Let's try a broader search for cache_directory usage
rg -A 5 "cache_directory" "executorlib/"
# Also search for any function definitions in these specific files
rg -A 10 "def " "executorlib/cache/shared.py" "executorlib/standalone/cache/spawner.py"
# And check for any class definitions that might be relevant
rg -A 10 "class.*Cache" "executorlib/"
Length of output: 24911
executorlib/cache/shared.py (2)
134-134
: LGTM: Correctly propagates cache_directory parameter
The addition of the cache_directory parameter to the execute_function call is consistent with the PR's objective and ensures proper working directory handling.
Line range hint 52-67
: Verify consistent cache_directory usage across the codebase
Let's verify that the cache_directory parameter is consistently handled across related files mentioned in the AI summary.
✅ Verification successful
Based on the comprehensive search results, I can now generate the final response:
Cache directory parameter is consistently used across the codebase
The verification shows that the cache_directory
parameter is consistently handled:
- In
executorlib/cache/shared.py
, it's properly documented and used for HDF5 file storage - In
executorlib/cache/executor.py
, it's consistently passed through to the execution functions - In
executorlib/interactive/shared.py
, it's properly handled for task execution caching - All test files demonstrate consistent usage with proper directory creation and cleanup
The parameter is used consistently for:
- Storing HDF5 input/output files
- Setting working directories for task execution
- Maintaining cache across different executor implementations
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for consistent cache_directory parameter usage across related files
# Check for cache_directory parameter in related files
echo "Checking cache_directory parameter usage in related files:"
rg -A 5 "def execute_" "executorlib/standalone/cache/"
# Check for potential file operations that might need the cache_directory
echo -e "\nChecking file operations that might need cache_directory:"
rg "os\.path\.(join|exists|isdir)" "executorlib/standalone/cache/"
Length of output: 1238
Script:
#!/bin/bash
# Let's expand our search to include more variations and the entire codebase
# Check for cache_directory parameter usage in the entire codebase
echo "Checking cache_directory parameter usage across the codebase:"
rg -A 5 "cache_directory" --type python
# Check for HDF5 file operations that might use cache_directory
echo -e "\nChecking HDF5 related operations:"
rg "h5py\.(File|Group)" --type python
# Check for any queue execution related functions
echo -e "\nChecking queue execution related functions:"
rg -A 5 "(def.*execute.*queue|future_queue)" --type python
Length of output: 552
Script:
#!/bin/bash
# Let's try without specifying file type
# Check for cache_directory parameter usage in the entire codebase
echo "Checking cache_directory parameter usage across the codebase:"
rg -A 5 "cache_directory"
# Check for HDF5 file operations that might use cache_directory
echo -e "\nChecking HDF5 related operations:"
rg "h5py\.(File|Group)"
# Check for any queue execution related functions
echo -e "\nChecking queue execution related functions:"
rg -A 5 "(def.*execute.*queue|future_queue)"
Length of output: 53682
resource_dict = {"cwd": cache_directory} | ||
elif len(resource_dict) == 0: | ||
resource_dict = {"cwd": cache_directory} | ||
elif "cwd" in resource_dict and resource_dict["cwd"] is None: | ||
resource_dict["cwd"] = cache_directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Refactoring suggestion needs revision due to cache_directory being optional
The refactoring suggestion needs to be reconsidered since cache_directory
is defined as Optional[str]
in the function signature, but there's no validation of its value before use. The current code could potentially set None
as the cwd
value, which might cause issues downstream.
Instead of simplifying, the code should handle the None
case:
if cache_directory is not None:
if resource_dict is None or len(resource_dict) == 0:
resource_dict = {"cwd": cache_directory}
elif "cwd" in resource_dict and resource_dict["cwd"] is None:
resource_dict["cwd"] = cache_directory
- The function signature shows
cache_directory: Optional[str] = None
- The docstring indicates it as
cache_directory (str)
without mentioning optional - No validation exists for
cache_directory
being None before using it
🔗 Analysis chain
Simplify conditional logic and verify None handling
The current implementation can be simplified by combining the conditions.
- if resource_dict is None:
- resource_dict = {"cwd": cache_directory}
- elif len(resource_dict) == 0:
- resource_dict = {"cwd": cache_directory}
- elif "cwd" in resource_dict and resource_dict["cwd"] is None:
- resource_dict["cwd"] = cache_directory
+ if resource_dict is None or len(resource_dict) == 0:
+ resource_dict = {"cwd": cache_directory}
+ elif "cwd" in resource_dict and resource_dict["cwd"] is None:
+ resource_dict["cwd"] = cache_directory
Additionally, we should verify what happens when both cache_directory
and resource_dict["cwd"]
are None.
Let's analyze how cache_directory is handled in the queue.py file to understand its validation and potential None cases.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for error handling or validation of cache_directory
rg -A 5 "cache_directory.*None" .
Length of output: 101
Script:
#!/bin/bash
# Search for cache_directory parameter and its usage
rg -A 10 'def.*cache_directory' executorlib/standalone/cache/queue.py
# Search for cache_directory validation or initialization
rg -B 5 'cache_directory' executorlib/standalone/cache/queue.py
# Search for any None checks or error handling related to cache_directory
ast-grep --pattern 'if $_ is None:
$$$' executorlib/standalone/cache/queue.py
Length of output: 1628
b8de525
to
2c34f22
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
executorlib/standalone/cache/spawner.py (2)
12-12
: Improve docstring for better clarity.The docstring should be enhanced to better describe the
cache_directory
parameter:- cache_directory (str): The directory to store the HDF5 files. + cache_directory (str, optional): Path to the directory where HDF5 files will be stored. + If provided and no working directory is specified in resource_dict, + this will be used as the subprocess working directory.Also applies to: 26-26
43-48
: Simplify working directory assignment logic.The current if-else block can be simplified using Python's dict.get() method.
- if resource_dict is None: - resource_dict = {} - if "cwd" in resource_dict: - cwd = resource_dict["cwd"] - else: - cwd = cache_directory + resource_dict = resource_dict or {} + cwd = resource_dict.get("cwd", cache_directory)🧰 Tools
🪛 Ruff
44-47: Use ternary operator
cwd = resource_dict["cwd"] if "cwd" in resource_dict else cache_directory
instead ofif
-else
-blockReplace
if
-else
-block withcwd = resource_dict["cwd"] if "cwd" in resource_dict else cache_directory
(SIM108)
44-47: Use
cwd = resource_dict.get("cwd", cache_directory)
instead of anif
blockReplace with
cwd = resource_dict.get("cwd", cache_directory)
(SIM401)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
executorlib/standalone/cache/queue.py
(3 hunks)executorlib/standalone/cache/spawner.py
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- executorlib/standalone/cache/queue.py
🧰 Additional context used
🪛 Ruff
executorlib/standalone/cache/spawner.py
44-47: Use ternary operator cwd = resource_dict["cwd"] if "cwd" in resource_dict else cache_directory
instead of if
-else
-block
Replace if
-else
-block with cwd = resource_dict["cwd"] if "cwd" in resource_dict else cache_directory
(SIM108)
44-47: Use cwd = resource_dict.get("cwd", cache_directory)
instead of an if
block
Replace with cwd = resource_dict.get("cwd", cache_directory)
(SIM401)
🔇 Additional comments (1)
executorlib/standalone/cache/spawner.py (1)
Line range hint 12-48
: Overall implementation looks good!
The changes effectively address the working directory configuration by introducing the cache_directory
parameter while maintaining backward compatibility through the existing resource_dict
mechanism.
🧰 Tools
🪛 Ruff
44-47: Use ternary operator cwd = resource_dict["cwd"] if "cwd" in resource_dict else cache_directory
instead of if
-else
-block
Replace if
-else
-block with cwd = resource_dict["cwd"] if "cwd" in resource_dict else cache_directory
(SIM108)
44-47: Use cwd = resource_dict.get("cwd", cache_directory)
instead of an if
block
Replace with cwd = resource_dict.get("cwd", cache_directory)
(SIM401)
elif len(resource_dict) == 0: | ||
resource_dict = {"cwd": None} | ||
return subprocess.Popen(command, universal_newlines=True, cwd=resource_dict["cwd"]) | ||
resource_dict = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Based on the search results, I can generate the final response as we have found clear evidence of code that relies on the "cwd" key in resource_dict:
The change to empty resource_dict initialization will cause runtime errors
The empty dictionary initialization will break existing code in multiple places:
executorlib/standalone/cache/spawner.py
directly accessesresource_dict["cwd"]
without checking existenceexecutorlib/standalone/cache/queue.py
has logic that depends on "cwd" key being present, including:- Checking if "cwd" exists and is not None
- Accessing the "cwd" value
- Explicitly deleting the "cwd" key
The original initialization with {"cwd": None}
was intentional and should be preserved to maintain compatibility with the existing code.
🔗 Analysis chain
Verify the impact of resource_dict initialization change.
The change from initializing with {"cwd": None}
to an empty dictionary might affect code that expects the "cwd" key to always exist.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for code that might rely on "cwd" key existence
rg -A 3 'resource_dict.*cwd' --type py
Length of output: 2523
Summary by CodeRabbit
cache_directory
, across multiple functions to specify the directory for storing HDF5 files, enhancing flexibility in file management.cache_directory
when necessary.cache_directory
parameter.