Skip to content

v1.5.0

Compare
Choose a tag to compare
@gipert gipert released this 01 Mar 09:35
· 214 commits to main since this release
664881f

What's Changed

Warning

The LH5 I/O routines have been refactored! Some function names have changed and new methods for loading and viewing data have been added. Read the migration guide below for more details. This release is fully backward compatible, but deprecation warnings will show up when using the old methods. Upgrade to the new recommended syntax to suppress them.

NEW: the package now offers support for viewing LGDO data (Tables, in particular) as Awkward arrays through the LGDO.view_as() interface. Awkward Array is a library for nested, variable-sized data, including arbitrary-length lists, records, mixed types, and missing data, using NumPy-like idioms.

Please consult the API documentation on https://legend-pydataobj.readthedocs.io to learn about the new methods.

Migration Guide

Imports

LH5 I/O related routines have been moved to a dedicated subpackage: lgdo.lh5

Old syntax:

from lgdo.lh5_store import LH5Store, ls
store = LH5Store()
ls("file.lh5")

New recommended syntax:

from lgdo import lh5
store = lh5.LH5Store()
lh5.ls("file.lh5")

Read/write LGDOs to disk

Old syntax:

store = LH5Store()
obj, _ = store.read_object("obj", "file.lh5")
store.write_object(obj, "obj", "file.lh5")

New syntax:

store = lh5.LH5Store()
obj, _ = store.read("obj", "file.lh5")
store.write(obj, "obj", "file.lh5")

Convert LGDO to another format

LGDO.view_as() is the new recommended way to view (i.e. without performing a copy) LGDOs in alternative formats (Pandas, Numpy, Awkward...)

Old syntax:

table = Table(...)
table.get_dataframe()

New syntax:

table.view_as("pd")

Old syntax:

from lgdo.lh5_store import load_nda, load_dfs
load_nda("file.lh5", ["obj"])
load_dfs("file.lh5", ["tbl"])

New syntax:

from lgdo import lh5
lh5.read_as("obj", "file.lh5", library="np")
lh5.read_as("obj", "file.lh5", library="pd")

New syntax (longer alternative):

from lgdo import lh5
store = lh5.LH5Store()

obj, _ = store.read("obj", "file.lh5")
obj.view_as("np")

tbl, _ = store.read("tbl", "file.lh5")
tbl.view_as("pd")

Full list of changes

  • Fixed bug in LH5Iterator when number of entries for file is zero by @iguinn in #39
  • Refactor of LH5 I/O routines, deprecation of existing methods by @MoritzNeuberger in #24
  • Support (environment) variables for tweaking Numba at runtime by @gipert in #44
  • Add vectorized operations to VectorOfVectors by @iguinn in #42
  • Add LGDO format conversion utilities by @MoritzNeuberger in #30
  • Added depth option to show and lh5ls by @iguinn in #52
  • Reimplement Table.eval(), now handling VectorOfVectors by @gipert in #53
  • Deprecate load_nda() and load_dfs() in favour of .view_as() by @gipert in #56
  • Support setting a fill value when "exploding" VectorOfVectors into NumPy arrays in .view_as("np") by @gipert in #57
  • Migrate to pyproject.toml, upgrade pre-commit config by @gipert in #59
  • Fix for reading just first row of VectorOfVectors by @ggmarshall in #63
  • Feature: lh5.read_as() to read LH5 data straight into third party data views by @gipert in #62
  • Added warning when adding a column to a table with different length by @MoritzNeuberger in #58
  • Add first version of CITATION.cff by @gipert in #64
  • Bug fix in LH5Store.read(): check for n_rows longer than idxs before dropping by @ggmarshall in #65
  • Bugfix for varlen error msgs and specify nda in view_as "ak" so dtype correctly inferred by @ggmarshall in #67
  • Add Patrick to CITATION.cff by @gipert in #68
  • Table.view_as() performance fixes by @gipert in #70

New Contributors

Full Changelog: v1.4.2...v1.5.0