This repository implements tatami bindings for HDF5-backed matrices, allowing some level of random access without loading the entire dataset into memory. Matrices can be conventionally stored as 2-dimensional HDF5 datasets, or in an ad hoc compressed sparse format with a 1-dimensional dataset for each component (data, indices, pointers).
tatami_hdf5 is a header-only library, so it can be easily used by just #include
ing the relevant source files:
#include "tatami_hdf5/tatami_hdf5.hpp"
// Dense HDF5 datasets.
tatami_hdf5::DenseMatrix<double, int> dense_mat(
"some_file.h5",
"dataset_name",
/* transposed = */ false
);
// Compressed sparse data stored in an ad hoc group.
tatami_hdf5::CompressedSparseMatrix<double, int> sparse_mat(
nrow,
ncol,
"some_file.h5",
"group_name/data",
"group_name/index",
"group_name/ptrs",
/* csr = */ true
);
In cases where performance is more important than memory consumption, we also provide some utilities to quickly create in-memory tatami matrices from their HDF5 representations:
auto dense_mat_mem = tatami_hdf5::load_dense_matrix<double, int>(
"some_file.h5",
"dataset_name"
);
auto sparse_mat_mem = tatami_hdf5::load_compressed_sparse_matrix<double, int>(
nrow,
ncol,
"some_file.h5",
"group_name/data",
"group_name/index",
"group_name/ptrs",
/* csr = */ true
);
We can also write a tatami sparse matrix into a HDF5 file:
H5::H5File fhandle("some_file2.h5", H5F_ACC_TRUNC);
auto ghandle = fhandle.createGroup("group_name");
tatami_hdf5::write_compressed_sparse_matrix(&sparse_mat_mem, ghandle);
Check out the reference documentation for more details.
If you're using CMake, you just need to add something like this to your CMakeLists.txt
:
include(FetchContent)
FetchContent_Declare(
tatami_hdf5
GIT_REPOSITORY https://github.com/tatami-inc/tatami_hdf5
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(tatami_hdf5)
Then you can link to tatami_hdf5 to make the headers available during compilation:
# For executables:
target_link_libraries(myexe tatami_hdf5)
# For libaries
target_link_libraries(mylib INTERFACE tatami_hdf5)
You can install the library by cloning a suitable version of this repository and running the following commands:
mkdir build && cd build
cmake .. -DTATAMI_HDF5_TESTS=OFF
cmake --build . --target install
Then you can use find_package()
as usual:
find_package(tatami_tatami_hdf5 CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE tatami::tatami_hdf5)
By default, this will use FetchContent
to fetch all external dependencies.
If you want to install them manually, use -DTATAMI_HDF5_FETCH_EXTERN=OFF
.
See extern/CMakeLists.txt
to find compatible versions of each dependency.
If you're not using CMake, the simple approach is to just copy the files - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I
.
The external dependencies listed in extern/CMakeLists.txt
need to be made available during compilation.
You'll also need to link to the HDF5 library yourself (version 1.10 or higher).
Specific frameworks may come with their own HDF5 binaries, e.g., Rhdf5lib, h5wasm.