-
Notifications
You must be signed in to change notification settings - Fork 4
Measure time for execution and store it in the HDF5 files #524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Warning Rate limit exceeded@jan-janssen has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 15 minutes and 38 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughThe pull request introduces runtime tracking functionality across multiple files in the executorlib package. By adding the Changes
Possibly related PRs
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
executorlib/backend/cache_parallel.py (1)
36-36
: Consider using a context manager for time measurement.While the current implementation is functional, using a context manager would make the code more maintainable and less prone to errors.
- time_start = time.time() + from contextlib import contextmanager + + @contextmanager + def measure_time(): + start = time.time() + yield + end = time.time() + return end - start + + with measure_time() as runtime: if mpi_rank_zero: apply_dict = backend_load_file(file_name=file_name) else: apply_dict = None apply_dict = MPI.COMM_WORLD.bcast(apply_dict, root=0) output = apply_dict["fn"].__call__(*apply_dict["args"], **apply_dict["kwargs"]) if mpi_size_larger_one: result = MPI.COMM_WORLD.gather(output, root=0) else: result = output if mpi_rank_zero: backend_write_file( file_name=file_name, output=result, - runtime=time.time() - time_start, + runtime=runtime, )Also applies to: 51-51
executorlib/cache/backend.py (1)
47-50
: Consider using a more descriptive key name for runtime.The key
"time"
in the data dictionary might be ambiguous. Consider using"execution_time"
or"runtime"
for better clarity.dump( file_name=file_name_out + ".h5ready", - data_dict={"output": output, "time": runtime}, + data_dict={"output": output, "runtime": runtime}, )executorlib/interactive/shared.py (1)
631-634
: Consider handling time measurement consistently across the codebase.While the implementation is correct, the time measurement approach differs from other files. Consider using the same pattern across all files for consistency.
- time_start = time.time() - result = interface.send_and_receive_dict(input_dict=task_dict) - data_dict["output"] = result - data_dict["runtime"] = time.time() - time_start + from contextlib import contextmanager + + @contextmanager + def measure_time(): + start = time.time() + yield + end = time.time() + return end - start + + with measure_time() as runtime: + result = interface.send_and_receive_dict(input_dict=task_dict) + data_dict["output"] = result + data_dict["runtime"] = runtime
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
executorlib/backend/cache_parallel.py
(3 hunks)executorlib/cache/backend.py
(3 hunks)executorlib/interactive/shared.py
(2 hunks)
🔇 Additional comments (4)
executorlib/backend/cache_parallel.py (1)
3-3
: LGTM: Time module import added correctly.
The time
module import is appropriately placed with other standard library imports.
executorlib/cache/backend.py (2)
32-50
: LGTM: Function signature and docstring updated correctly.
The runtime
parameter is properly added to the function signature and well-documented in the docstring.
65-70
: LGTM: Time measurement implementation is correct.
The execution time is properly measured around the task execution.
executorlib/interactive/shared.py (1)
5-5
: LGTM: Time module import added correctly.
The time
module import is appropriately placed with other standard library imports.
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
executorlib/standalone/hdf.py (1)
77-92
: Consider enhancing error handling and documentationWhile the implementation is functional, consider the following improvements:
- The default value of 0.0 might be misleading as it suggests instantaneous execution. Consider using
None
or raising an exception.- Add type validation when loading the runtime value.
- Add specific error handling for file access issues.
- Document the time units (seconds, milliseconds, etc.) in the docstring.
Here's a suggested implementation:
def get_runtime(file_name: str) -> float: """ Get run time from HDF5 file Args: file_name (str): file name of the HDF5 file as absolute path Returns: - float: run time from the execution of the python function + float: execution time in seconds, or None if runtime is not available + + Raises: + IOError: If the HDF5 file cannot be accessed + ValueError: If the stored runtime value is invalid """ - with h5py.File(file_name, "r") as hdf: - if "runtime" in hdf: - return cloudpickle.loads(np.void(hdf["/runtime"])) - else: - return 0.0 + try: + with h5py.File(file_name, "r") as hdf: + if "runtime" in hdf: + runtime = cloudpickle.loads(np.void(hdf["/runtime"])) + if not isinstance(runtime, (int, float)) or runtime < 0: + raise ValueError(f"Invalid runtime value: {runtime}") + return runtime + return None + except (OSError, IOError) as e: + raise IOError(f"Failed to access HDF5 file: {e}")tests/test_cache_shared.py (1)
Line range hint
107-109
: Enhance test cleanup processThe current tearDown method could be more thorough in cleaning up test resources.
Consider enhancing the cleanup:
def tearDown(self): + # Force close any open file handles + import gc + gc.collect() if os.path.exists("cache"): - shutil.rmtree("cache") + try: + shutil.rmtree("cache") + except PermissionError: + # On Windows, files might still be locked + import time + time.sleep(0.1) + shutil.rmtree("cache") + # Verify cleanup + self.assertFalse(os.path.exists("cache"))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
executorlib/cache/backend.py
(3 hunks)executorlib/standalone/hdf.py
(2 hunks)tests/test_cache_shared.py
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- executorlib/cache/backend.py
🔇 Additional comments (1)
executorlib/standalone/hdf.py (1)
21-21
: LGTM: Runtime storage mapping added correctly
The addition of the "runtime" key to group_dict
follows the existing pattern and enables runtime storage in HDF5 files.
self.assertTrue( | ||
get_runtime(file_name=os.path.join(cache_directory, task_key + ".h5out")) | ||
> 0.0 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider enhancing runtime validation tests
The current runtime validation is minimal and duplicated across test methods. Consider:
- Testing more specific runtime ranges based on the function's expected execution time
- Extracting the runtime validation into a helper method to reduce duplication
- Adding tests for error cases (invalid files, corrupted runtime data)
Here's a suggested refactor to reduce duplication:
def assert_valid_runtime(self, task_key: str, cache_directory: str) -> None:
"""Helper method to validate task runtime"""
runtime = get_runtime(file_name=os.path.join(cache_directory, task_key + ".h5out"))
self.assertIsNotNone(runtime, "Runtime should be recorded")
self.assertGreater(runtime, 0.0, "Runtime should be positive")
# Add more specific assertions based on expected execution time
self.assertLess(runtime, 1.0, "Simple addition should take less than 1 second")
Then use it in each test:
self.assert_valid_runtime(task_key, cache_directory)
Also applies to: 70-73, 97-100
for more information, see https://pre-commit.ci
Summary by CodeRabbit
New Features
Bug Fixes
Documentation