-
Notifications
You must be signed in to change notification settings - Fork 14
Replaces PyTables with H5PY as the HDF5 interface #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
refactoring refactoring refactoring refactoring refactoring debug debug debug . . . . . . . . . . . Replace two-arg super with zero-arg super Add fletcher32 option support to mirror pytables omx Some type hints WIP: add 'data' property and use it (once, more to come) Update .github/workflows/ci.yml Co-authored-by: Jake Moss <jake@outerloop.io> Update .github/workflows/ci.yml Co-authored-by: Jake Moss <jake@outerloop.io> Update tests/test_file.py Co-authored-by: Jake Moss <jake@outerloop.io> Update src/openmatrix/__init__.py Co-authored-by: Jake Moss <jake@outerloop.io> Update .github/workflows/ci.yml Co-authored-by: Jake Moss <jake@outerloop.io> Update pyproject.toml Add coverage testing Ensure data and lookup always exist, remove excessive double checks We can assume that once an OMX file is open, it's in a valid state, otherwise we should loudly complain Fix and clean up validator tests Fix and extend testing suite, reach 100% branch coverage Remove bad CI options Quotes? Forgot rename Bump CI versions Restore and update doc strings Update README documentation
Refactoring pytables out and h5py in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request replaces PyTables with h5py as the HDF5 interface for the openmatrix library, addressing issue #12. The change modernizes the codebase and removes dependency on the slow-developing PyTables library.
Changes:
- Complete replacement of PyTables with h5py throughout the codebase
- Migration to modern Python packaging using pyproject.toml (replacing setup.py)
- Addition of comprehensive test suites with pytest
- Introduction of CI/CD workflow using GitHub Actions
- Updated documentation and examples to reflect h5py usage
Reviewed changes
Copilot reviewed 17 out of 18 changed files in this pull request and generated 27 comments.
Show a summary per file
| File | Description |
|---|---|
src/openmatrix/file.py |
New File class implementation extending h5py.File instead of tables.File |
src/openmatrix/__init__.py |
Updated module exports and open_file function with h5py-compatible filters |
src/openmatrix/validator.py |
Rewritten validator to work with h5py APIs |
src/openmatrix/exceptions.py |
Minimal changes - whitespace cleanup |
tests/test_file.py |
Comprehensive new test suite covering file operations |
tests/test_validator.py |
Complete test coverage for validator functionality |
pyproject.toml |
New build configuration replacing setup.py |
.github/workflows/ci.yml |
New CI workflow for automated testing across Python versions |
README.md |
Updated documentation to reference h5py instead of PyTables |
example/python-omx-sample.py |
Code formatting updates |
CHANGES.txt |
Added version 0.4.0 changelog entry |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| Returns: | ||
| -------- | ||
| mapping : tables.array |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type documentation references "tables.array" which is a PyTables-specific type. This should be updated to "h5py.Dataset" to reflect the h5py implementation.
| mapping : tables.array | |
| mapping : h5py.Dataset |
| Create an OMX Matrix (CArray) at the root level. User must pass in either | ||
| an existing numpy matrix, or a shape and an atom type. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| name : string | ||
| The name of this matrix. Stored in HDF5 as the leaf name. | ||
| title : string | ||
| Short description of this matrix. Default is ''. | ||
| obj : numpy.CArray | ||
| Existing numpy array from which to create this OMX matrix. If obj is passed in, | ||
| then shape and atom can be left blank. If obj is not passed in, then a shape and | ||
| atom must be specified instead. Default is None. | ||
| shape : numpy.array | ||
| Optional shape of the matrix. Shape is an int32 numpy array of format (rows,columns). | ||
| If shape is not specified, an existing numpy CArray must be passed in instead, | ||
| If shape is not specified, an existing numpy CArray must be passed in instead, | ||
| as the 'obj' parameter. Default is None. | ||
| atom : atom_type | ||
| Optional atom type of the data. Can be int32, float32, etc. Default is None. | ||
| If None specified, then obj parameter must be passed in instead. | ||
| title : string | ||
| Short description of this matrix. Default is ''. | ||
| filters : tables.Filters | ||
| Set of HDF5 filters (compression, etc) used for creating the matrix. | ||
| Set of HDF5 filters (compression, etc) used for creating the matrix. | ||
| Default is None. See HDF5 documentation for details. Note: while the default here | ||
| is None, the default set of filters set at the OMX parent file level is | ||
| is None, the default set of filters set at the OMX parent file level is | ||
| zlib compression level 1. Those settings usually trickle down to the table level. | ||
| chunks: bool or tuple[int, int] | ||
| Enable HDF5 array chunking. A value of True enables HDF5 to guess the best chunk size. Chunk size may impact | ||
| I/O performance. | ||
| obj : numpy.NDArray | ||
| Existing numpy array from which to create this OMX matrix. If obj is passed in, | ||
| then shape and atom can be left blank. If obj is not passed in, then a shape and | ||
| atom must be specified instead. Default is None. | ||
| dtype: numpy.dtype | ||
| Underlying data to use for storage. Defaults to the datatype of obj. | ||
| attrs : dict | ||
| Dictionary of attribute names and values to be attached to this matrix. | ||
| Default is None. | ||
|
|
||
| Returns | ||
| ------- | ||
| matrix : tables.carray | ||
| matrix : h5py.Dataset | ||
| HDF5 CArray matrix |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation contains multiple PyTables-specific references that should be updated for h5py: Line 248 mentions "CArray" which is PyTables-specific. Line 255-257 describes shape as "an int32 numpy array" and references "numpy CArray". Line 261 references "tables.Filters". Line 282 describes the return as "HDF5 CArray matrix" instead of "HDF5 Dataset".
| self, | ||
| name: Union[str, PathLike], | ||
| mode: Literal["r", "w", "a", "r+", "w-", "x"], | ||
| title: str = "", |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'title' parameter is defined in the init method but is never used. In PyTables, the title was used for the HDF5 file title attribute. This parameter should either be removed from the signature, or if it's meant to maintain API compatibility, it should be documented that it's deprecated/unused, or it should be applied to the file attributes.
| def open_file( | ||
| filename: Union[str, PathLike], | ||
| mode: Literal["r", "w", "a", "r+", "w-", "x"] = "r", | ||
| title: str = "", |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'title' parameter is passed through to File.init but is never actually used. Consider either removing it or documenting that it's deprecated for backwards compatibility.
| are stored internally in the 'lookup' subset of the HDF5 file | ||
| structure. Returns empty list if there are no mappings. | ||
| """ | ||
| return list(self.lookup.keys()) |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list_mappings() method documentation states "Returns empty list if there are no mappings", but the implementation will raise KeyError if the lookup group doesn't exist. The old PyTables implementation handled this exception and returned an empty list. This is a breaking change from the previous behavior. The method should catch KeyError when accessing self.lookup and return an empty list to maintain backwards compatibility.
| return list(self.lookup.keys()) | |
| try: | |
| lookup_group = self.lookup | |
| except KeyError: | |
| # For backward compatibility and per documentation, return an empty list | |
| # when the 'lookup' group does not exist. | |
| return [] | |
| return list(lookup_group.keys()) |
| return (False, required, checknum, str(err)) | ||
|
|
||
|
|
||
| def check10(mat_file, required=False, checknum=10): |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check10 returns tuple of size 3 and tuple of size 4.
| return (False, required, checknum, str(err)) | ||
|
|
||
|
|
||
| def check11(mat_file, required=False, checknum=11): |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check11 returns tuple of size 3 and tuple of size 4.
| return (False, required, checknum, str(err)) | ||
|
|
||
|
|
||
| def check12(mat_file, required=False, checknum=12): |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check12 returns tuple of size 3 and tuple of size 4.
| import numpy as np | ||
| import openmatrix as omx | ||
| import pytest | ||
| from openmatrix import validator |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Module 'openmatrix' is imported with both 'import' and 'import from'.
| from openmatrix import validator | |
| validator = omx.validator |
|
|
||
| try: | ||
| del self[key] | ||
| except KeyError: |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'except' clause does nothing but pass and there is no explanatory comment.
| except KeyError: | |
| except KeyError: | |
| # If the key does not exist yet, there's nothing to delete; this is expected. |
Closes #12
It also adds a ton of software engineering practices, like CI/CD, modern packaging, etc.
Kudos to Jake Moss for the heavy lifting on this one.