Skip to content

Conversation

@pedrocamargo
Copy link

@pedrocamargo pedrocamargo commented Feb 2, 2026

Closes #12

It also adds a ton of software engineering practices, like CI/CD, modern packaging, etc.

Kudos to Jake Moss for the heavy lifting on this one.

refactoring

refactoring

refactoring

refactoring

refactoring

debug

debug

debug

.

.

.

.

.

.

.

.

.

.

.

Replace two-arg super with zero-arg super

Add fletcher32 option support to mirror pytables omx

Some type hints

WIP: add 'data' property and use it (once, more to come)

Update .github/workflows/ci.yml

Co-authored-by: Jake Moss <jake@outerloop.io>

Update .github/workflows/ci.yml

Co-authored-by: Jake Moss <jake@outerloop.io>

Update tests/test_file.py

Co-authored-by: Jake Moss <jake@outerloop.io>

Update src/openmatrix/__init__.py

Co-authored-by: Jake Moss <jake@outerloop.io>

Update .github/workflows/ci.yml

Co-authored-by: Jake Moss <jake@outerloop.io>

Update pyproject.toml

Add coverage testing

Ensure data and lookup always exist, remove excessive double checks

We can assume that once an OMX file is open, it's in a valid state, otherwise we should loudly complain

Fix and clean up validator tests

Fix and extend testing suite, reach 100% branch coverage

Remove bad CI options

Quotes?

Forgot rename

Bump CI versions

Restore and update doc strings

Update README documentation
Refactoring pytables out and h5py in
Copilot AI review requested due to automatic review settings February 2, 2026 01:24
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request replaces PyTables with h5py as the HDF5 interface for the openmatrix library, addressing issue #12. The change modernizes the codebase and removes dependency on the slow-developing PyTables library.

Changes:

  • Complete replacement of PyTables with h5py throughout the codebase
  • Migration to modern Python packaging using pyproject.toml (replacing setup.py)
  • Addition of comprehensive test suites with pytest
  • Introduction of CI/CD workflow using GitHub Actions
  • Updated documentation and examples to reflect h5py usage

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 27 comments.

Show a summary per file
File Description
src/openmatrix/file.py New File class implementation extending h5py.File instead of tables.File
src/openmatrix/__init__.py Updated module exports and open_file function with h5py-compatible filters
src/openmatrix/validator.py Rewritten validator to work with h5py APIs
src/openmatrix/exceptions.py Minimal changes - whitespace cleanup
tests/test_file.py Comprehensive new test suite covering file operations
tests/test_validator.py Complete test coverage for validator functionality
pyproject.toml New build configuration replacing setup.py
.github/workflows/ci.yml New CI workflow for automated testing across Python versions
README.md Updated documentation to reference h5py instead of PyTables
example/python-omx-sample.py Code formatting updates
CHANGES.txt Added version 0.4.0 changelog entry

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


Returns:
--------
mapping : tables.array
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type documentation references "tables.array" which is a PyTables-specific type. This should be updated to "h5py.Dataset" to reflect the h5py implementation.

Suggested change
mapping : tables.array
mapping : h5py.Dataset

Copilot uses AI. Check for mistakes.
Comment on lines 248 to 282
Create an OMX Matrix (CArray) at the root level. User must pass in either
an existing numpy matrix, or a shape and an atom type.

Parameters
----------
name : string
The name of this matrix. Stored in HDF5 as the leaf name.
title : string
Short description of this matrix. Default is ''.
obj : numpy.CArray
Existing numpy array from which to create this OMX matrix. If obj is passed in,
then shape and atom can be left blank. If obj is not passed in, then a shape and
atom must be specified instead. Default is None.
shape : numpy.array
Optional shape of the matrix. Shape is an int32 numpy array of format (rows,columns).
If shape is not specified, an existing numpy CArray must be passed in instead,
If shape is not specified, an existing numpy CArray must be passed in instead,
as the 'obj' parameter. Default is None.
atom : atom_type
Optional atom type of the data. Can be int32, float32, etc. Default is None.
If None specified, then obj parameter must be passed in instead.
title : string
Short description of this matrix. Default is ''.
filters : tables.Filters
Set of HDF5 filters (compression, etc) used for creating the matrix.
Set of HDF5 filters (compression, etc) used for creating the matrix.
Default is None. See HDF5 documentation for details. Note: while the default here
is None, the default set of filters set at the OMX parent file level is
is None, the default set of filters set at the OMX parent file level is
zlib compression level 1. Those settings usually trickle down to the table level.
chunks: bool or tuple[int, int]
Enable HDF5 array chunking. A value of True enables HDF5 to guess the best chunk size. Chunk size may impact
I/O performance.
obj : numpy.NDArray
Existing numpy array from which to create this OMX matrix. If obj is passed in,
then shape and atom can be left blank. If obj is not passed in, then a shape and
atom must be specified instead. Default is None.
dtype: numpy.dtype
Underlying data to use for storage. Defaults to the datatype of obj.
attrs : dict
Dictionary of attribute names and values to be attached to this matrix.
Default is None.

Returns
-------
matrix : tables.carray
matrix : h5py.Dataset
HDF5 CArray matrix
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation contains multiple PyTables-specific references that should be updated for h5py: Line 248 mentions "CArray" which is PyTables-specific. Line 255-257 describes shape as "an int32 numpy array" and references "numpy CArray". Line 261 references "tables.Filters". Line 282 describes the return as "HDF5 CArray matrix" instead of "HDF5 Dataset".

Copilot uses AI. Check for mistakes.
self,
name: Union[str, PathLike],
mode: Literal["r", "w", "a", "r+", "w-", "x"],
title: str = "",
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'title' parameter is defined in the init method but is never used. In PyTables, the title was used for the HDF5 file title attribute. This parameter should either be removed from the signature, or if it's meant to maintain API compatibility, it should be documented that it's deprecated/unused, or it should be applied to the file attributes.

Copilot uses AI. Check for mistakes.
def open_file(
filename: Union[str, PathLike],
mode: Literal["r", "w", "a", "r+", "w-", "x"] = "r",
title: str = "",
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'title' parameter is passed through to File.init but is never actually used. Consider either removing it or documenting that it's deprecated for backwards compatibility.

Copilot uses AI. Check for mistakes.
are stored internally in the 'lookup' subset of the HDF5 file
structure. Returns empty list if there are no mappings.
"""
return list(self.lookup.keys())
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list_mappings() method documentation states "Returns empty list if there are no mappings", but the implementation will raise KeyError if the lookup group doesn't exist. The old PyTables implementation handled this exception and returned an empty list. This is a breaking change from the previous behavior. The method should catch KeyError when accessing self.lookup and return an empty list to maintain backwards compatibility.

Suggested change
return list(self.lookup.keys())
try:
lookup_group = self.lookup
except KeyError:
# For backward compatibility and per documentation, return an empty list
# when the 'lookup' group does not exist.
return []
return list(lookup_group.keys())

Copilot uses AI. Check for mistakes.
return (False, required, checknum, str(err))


def check10(mat_file, required=False, checknum=10):
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check10 returns tuple of size 3 and tuple of size 4.

Copilot uses AI. Check for mistakes.
return (False, required, checknum, str(err))


def check11(mat_file, required=False, checknum=11):
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check11 returns tuple of size 3 and tuple of size 4.

Copilot uses AI. Check for mistakes.
return (False, required, checknum, str(err))


def check12(mat_file, required=False, checknum=12):
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check12 returns tuple of size 3 and tuple of size 4.

Copilot uses AI. Check for mistakes.
import numpy as np
import openmatrix as omx
import pytest
from openmatrix import validator
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module 'openmatrix' is imported with both 'import' and 'import from'.

Suggested change
from openmatrix import validator
validator = omx.validator

Copilot uses AI. Check for mistakes.

try:
del self[key]
except KeyError:
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except KeyError:
except KeyError:
# If the key does not exist yet, there's nothing to delete; this is expected.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replacing HDF5 interface

1 participant