Skip to content

Fix working directory #481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Nov 8, 2024
Merged

Fix working directory #481

merged 8 commits into from
Nov 8, 2024

Conversation

jan-janssen
Copy link
Member

@jan-janssen jan-janssen commented Nov 8, 2024

Summary by CodeRabbit

  • New Features
    • Introduced a new optional parameter, cache_directory, across multiple functions to specify the directory for storing HDF5 files, enhancing flexibility in file management.
  • Bug Fixes
    • Updated logic for setting working directories to ensure they default to the specified cache_directory when necessary.
  • Documentation
    • Updated function docstrings to include descriptions for the new cache_directory parameter.

Copy link
Contributor

coderabbitai bot commented Nov 8, 2024

Walkthrough

The pull request introduces a new optional parameter, cache_directory, to multiple functions within the executorlib module, specifically in executorlib/cache/shared.py, executorlib/standalone/cache/queue.py, and executorlib/standalone/cache/spawner.py. This parameter allows users to specify a directory for storing HDF5 files, enhancing the flexibility of file management. The logic for constructing file paths and handling resource dictionaries has been updated accordingly, while maintaining existing error handling behaviors.

Changes

File Path Change Summary
executorlib/cache/shared.py Added cache_directory parameter to execute_tasks_h5 function; updated file path logic and command generation.
executorlib/standalone/cache/queue.py Added cache_directory parameter to execute_with_pysqa function; modified resource_dict handling.
executorlib/standalone/cache/spawner.py Added cache_directory parameter to execute_in_subprocess function; updated resource_dict initialization.

Possibly related PRs

🐰 In the cache where files reside,
A new path opens wide,
With cache_directory we play,
HDF5 finds its way,
Flexibility's the name of the game,
In our coding hall of fame! 🎉


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (4)
executorlib/standalone/cache/spawner.py (2)

12-12: Enhance the cache_directory parameter documentation.

The docstring could be more descriptive about:

  • Whether the directory must exist or will be created
  • What happens if the directory is None (default behavior)
  • The expected permissions and requirements for the directory

Also applies to: 26-26


42-45: Simplify resource dictionary initialization.

The two conditions can be combined for better readability.

-    if resource_dict is None:
-        resource_dict = {"cwd": cache_directory}
-    elif len(resource_dict) == 0:
-        resource_dict = {"cwd": cache_directory}
+    if resource_dict is None or len(resource_dict) == 0:
+        resource_dict = {"cwd": cache_directory}
🧰 Tools
🪛 Ruff

42-45: Combine if branches using logical or operator

Combine if branches

(SIM114)

executorlib/standalone/cache/queue.py (1)

28-28: Enhance parameter documentation

Consider expanding the documentation for cache_directory to include:

  • Its relationship with resource_dict["cwd"]
  • Whether the directory must exist
  • The purpose of storing HDF5 files there
-        cache_directory (str): The directory to store the HDF5 files.
+        cache_directory (str, optional): Directory to store HDF5 files. If provided, this value
+            will be used as the working directory when resource_dict is None, empty, or has
+            cwd=None. The directory should exist before execution.
executorlib/cache/shared.py (1)

Line range hint 52-67: Add validation for cache_directory parameter

The function relies heavily on the cache_directory for file operations but doesn't validate if the directory exists or is writable. This could lead to runtime errors if an invalid directory is provided.

Consider adding validation at the start of the function:

def execute_tasks_h5(
    future_queue: queue.Queue,
    cache_directory: str,
    execute_function: callable,
    resource_dict: dict,
    terminate_function: Optional[callable] = None,
    pysqa_config_directory: Optional[str] = None,
    backend: Optional[str] = None,
    disable_dependencies: bool = False,
) -> None:
+    if not os.path.isdir(cache_directory):
+        raise ValueError(f"Cache directory does not exist: {cache_directory}")
+    if not os.access(cache_directory, os.W_OK):
+        raise ValueError(f"Cache directory is not writable: {cache_directory}")

    memory_dict, process_dict, file_name_dict = {}, {}, {}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 2132028 and 9afcb56.

📒 Files selected for processing (3)
  • executorlib/cache/shared.py (1 hunks)
  • executorlib/standalone/cache/queue.py (2 hunks)
  • executorlib/standalone/cache/spawner.py (3 hunks)
🧰 Additional context used
🪛 Ruff
executorlib/standalone/cache/queue.py

33-36: Combine if branches using logical or operator

Combine if branches

(SIM114)

executorlib/standalone/cache/spawner.py

42-45: Combine if branches using logical or operator

Combine if branches

(SIM114)

🔇 Additional comments (5)
executorlib/standalone/cache/spawner.py (1)

Line range hint 12-45: Add validation for cache_directory parameter.

The cache_directory parameter is used directly without validation, which could lead to security issues:

  1. Directory existence is not verified
  2. No protection against path traversal attacks
  3. Directory permissions are not checked

Consider adding validation before using the directory as the working directory for subprocesses.

executorlib/standalone/cache/queue.py (2)

14-14: LGTM: Clean signature update

The new cache_directory parameter is properly typed and follows Python conventions.


14-14: Verify consistent cache_directory handling across files

Since this change affects working directory handling, we should verify that other files mentioned in the PR (executorlib/cache/shared.py and executorlib/standalone/cache/spawner.py) handle cache_directory consistently.

Also applies to: 34-38

✅ Verification successful

Based on the search results, I can now provide a final response:

Cache directory handling is consistent across the codebase

The cache_directory parameter is handled consistently across all relevant files:

  • In executorlib/standalone/cache/queue.py and executorlib/standalone/cache/spawner.py, it's used to set the working directory (cwd) in the resource dictionary
  • In executorlib/cache/shared.py and executorlib/cache/executor.py, it's used for storing HDF5 files
  • All implementations treat it as an optional parameter with similar documentation patterns
  • The parameter is properly propagated through the execution chain
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for cache_directory parameter usage in related files
rg -A 10 "def.*cache_directory" "executorlib/cache/shared.py" "executorlib/standalone/cache/spawner.py"

Length of output: 101


Script:

#!/bin/bash
# Let's try a broader search for cache_directory usage
rg -A 5 "cache_directory" "executorlib/"

# Also search for any function definitions in these specific files
rg -A 10 "def " "executorlib/cache/shared.py" "executorlib/standalone/cache/spawner.py"

# And check for any class definitions that might be relevant
rg -A 10 "class.*Cache" "executorlib/"

Length of output: 24911

executorlib/cache/shared.py (2)

134-134: LGTM: Correctly propagates cache_directory parameter

The addition of the cache_directory parameter to the execute_function call is consistent with the PR's objective and ensures proper working directory handling.


Line range hint 52-67: Verify consistent cache_directory usage across the codebase

Let's verify that the cache_directory parameter is consistently handled across related files mentioned in the AI summary.

✅ Verification successful

Based on the comprehensive search results, I can now generate the final response:

Cache directory parameter is consistently used across the codebase

The verification shows that the cache_directory parameter is consistently handled:

  • In executorlib/cache/shared.py, it's properly documented and used for HDF5 file storage
  • In executorlib/cache/executor.py, it's consistently passed through to the execution functions
  • In executorlib/interactive/shared.py, it's properly handled for task execution caching
  • All test files demonstrate consistent usage with proper directory creation and cleanup

The parameter is used consistently for:

  • Storing HDF5 input/output files
  • Setting working directories for task execution
  • Maintaining cache across different executor implementations
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for consistent cache_directory parameter usage across related files

# Check for cache_directory parameter in related files
echo "Checking cache_directory parameter usage in related files:"
rg -A 5 "def execute_" "executorlib/standalone/cache/"

# Check for potential file operations that might need the cache_directory
echo -e "\nChecking file operations that might need cache_directory:"
rg "os\.path\.(join|exists|isdir)" "executorlib/standalone/cache/"

Length of output: 1238


Script:

#!/bin/bash
# Let's expand our search to include more variations and the entire codebase

# Check for cache_directory parameter usage in the entire codebase
echo "Checking cache_directory parameter usage across the codebase:"
rg -A 5 "cache_directory" --type python

# Check for HDF5 file operations that might use cache_directory
echo -e "\nChecking HDF5 related operations:"
rg "h5py\.(File|Group)" --type python

# Check for any queue execution related functions
echo -e "\nChecking queue execution related functions:"
rg -A 5 "(def.*execute.*queue|future_queue)" --type python

Length of output: 552


Script:

#!/bin/bash
# Let's try without specifying file type

# Check for cache_directory parameter usage in the entire codebase
echo "Checking cache_directory parameter usage across the codebase:"
rg -A 5 "cache_directory"

# Check for HDF5 file operations that might use cache_directory
echo -e "\nChecking HDF5 related operations:"
rg "h5py\.(File|Group)"

# Check for any queue execution related functions
echo -e "\nChecking queue execution related functions:"
rg -A 5 "(def.*execute.*queue|future_queue)"

Length of output: 53682

Comment on lines 34 to 38
resource_dict = {"cwd": cache_directory}
elif len(resource_dict) == 0:
resource_dict = {"cwd": cache_directory}
elif "cwd" in resource_dict and resource_dict["cwd"] is None:
resource_dict["cwd"] = cache_directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Refactoring suggestion needs revision due to cache_directory being optional

The refactoring suggestion needs to be reconsidered since cache_directory is defined as Optional[str] in the function signature, but there's no validation of its value before use. The current code could potentially set None as the cwd value, which might cause issues downstream.

Instead of simplifying, the code should handle the None case:

if cache_directory is not None:
    if resource_dict is None or len(resource_dict) == 0:
        resource_dict = {"cwd": cache_directory}
    elif "cwd" in resource_dict and resource_dict["cwd"] is None:
        resource_dict["cwd"] = cache_directory
  • The function signature shows cache_directory: Optional[str] = None
  • The docstring indicates it as cache_directory (str) without mentioning optional
  • No validation exists for cache_directory being None before using it
🔗 Analysis chain

Simplify conditional logic and verify None handling

The current implementation can be simplified by combining the conditions.

-    if resource_dict is None:
-        resource_dict = {"cwd": cache_directory}
-    elif len(resource_dict) == 0:
-        resource_dict = {"cwd": cache_directory}
-    elif "cwd" in resource_dict and resource_dict["cwd"] is None:
-        resource_dict["cwd"] = cache_directory
+    if resource_dict is None or len(resource_dict) == 0:
+        resource_dict = {"cwd": cache_directory}
+    elif "cwd" in resource_dict and resource_dict["cwd"] is None:
+        resource_dict["cwd"] = cache_directory

Additionally, we should verify what happens when both cache_directory and resource_dict["cwd"] are None.


Let's analyze how cache_directory is handled in the queue.py file to understand its validation and potential None cases.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for error handling or validation of cache_directory
rg -A 5 "cache_directory.*None" .

Length of output: 101


Script:

#!/bin/bash
# Search for cache_directory parameter and its usage
rg -A 10 'def.*cache_directory' executorlib/standalone/cache/queue.py

# Search for cache_directory validation or initialization
rg -B 5 'cache_directory' executorlib/standalone/cache/queue.py

# Search for any None checks or error handling related to cache_directory
ast-grep --pattern 'if $_ is None:
  $$$'  executorlib/standalone/cache/queue.py

Length of output: 1628

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
executorlib/standalone/cache/spawner.py (2)

12-12: Improve docstring for better clarity.

The docstring should be enhanced to better describe the cache_directory parameter:

-        cache_directory (str): The directory to store the HDF5 files.
+        cache_directory (str, optional): Path to the directory where HDF5 files will be stored. 
+            If provided and no working directory is specified in resource_dict, 
+            this will be used as the subprocess working directory.

Also applies to: 26-26


43-48: Simplify working directory assignment logic.

The current if-else block can be simplified using Python's dict.get() method.

-    if resource_dict is None:
-        resource_dict = {}
-    if "cwd" in resource_dict:
-        cwd = resource_dict["cwd"]
-    else:
-        cwd = cache_directory
+    resource_dict = resource_dict or {}
+    cwd = resource_dict.get("cwd", cache_directory)
🧰 Tools
🪛 Ruff

44-47: Use ternary operator cwd = resource_dict["cwd"] if "cwd" in resource_dict else cache_directory instead of if-else-block

Replace if-else-block with cwd = resource_dict["cwd"] if "cwd" in resource_dict else cache_directory

(SIM108)


44-47: Use cwd = resource_dict.get("cwd", cache_directory) instead of an if block

Replace with cwd = resource_dict.get("cwd", cache_directory)

(SIM401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9afcb56 and 26305a8.

📒 Files selected for processing (2)
  • executorlib/standalone/cache/queue.py (3 hunks)
  • executorlib/standalone/cache/spawner.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • executorlib/standalone/cache/queue.py
🧰 Additional context used
🪛 Ruff
executorlib/standalone/cache/spawner.py

44-47: Use ternary operator cwd = resource_dict["cwd"] if "cwd" in resource_dict else cache_directory instead of if-else-block

Replace if-else-block with cwd = resource_dict["cwd"] if "cwd" in resource_dict else cache_directory

(SIM108)


44-47: Use cwd = resource_dict.get("cwd", cache_directory) instead of an if block

Replace with cwd = resource_dict.get("cwd", cache_directory)

(SIM401)

🔇 Additional comments (1)
executorlib/standalone/cache/spawner.py (1)

Line range hint 12-48: Overall implementation looks good!

The changes effectively address the working directory configuration by introducing the cache_directory parameter while maintaining backward compatibility through the existing resource_dict mechanism.

🧰 Tools
🪛 Ruff

44-47: Use ternary operator cwd = resource_dict["cwd"] if "cwd" in resource_dict else cache_directory instead of if-else-block

Replace if-else-block with cwd = resource_dict["cwd"] if "cwd" in resource_dict else cache_directory

(SIM108)


44-47: Use cwd = resource_dict.get("cwd", cache_directory) instead of an if block

Replace with cwd = resource_dict.get("cwd", cache_directory)

(SIM401)

elif len(resource_dict) == 0:
resource_dict = {"cwd": None}
return subprocess.Popen(command, universal_newlines=True, cwd=resource_dict["cwd"])
resource_dict = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Based on the search results, I can generate the final response as we have found clear evidence of code that relies on the "cwd" key in resource_dict:

The change to empty resource_dict initialization will cause runtime errors

The empty dictionary initialization will break existing code in multiple places:

  • executorlib/standalone/cache/spawner.py directly accesses resource_dict["cwd"] without checking existence
  • executorlib/standalone/cache/queue.py has logic that depends on "cwd" key being present, including:
    • Checking if "cwd" exists and is not None
    • Accessing the "cwd" value
    • Explicitly deleting the "cwd" key

The original initialization with {"cwd": None} was intentional and should be preserved to maintain compatibility with the existing code.

🔗 Analysis chain

Verify the impact of resource_dict initialization change.

The change from initializing with {"cwd": None} to an empty dictionary might affect code that expects the "cwd" key to always exist.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for code that might rely on "cwd" key existence
rg -A 3 'resource_dict.*cwd' --type py

Length of output: 2523

@jan-janssen jan-janssen merged commit d31907f into main Nov 8, 2024
25 of 26 checks passed
@jan-janssen jan-janssen deleted the working_directory branch November 8, 2024 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant