Skip to content

Conversation

@LTLA
Copy link

@LTLA LTLA commented Aug 20, 2025

Just making this PR for discussion at this point, and so that we don't forget about it when Mike gets back.

Anyway, this PR updates Rhdf5lib to the latest HDF5 (1.14.6 2.0.0), using the (somewhat recent) biocmake package to handle the build process without any manual fiddling with autoconf, etc.

It also allows administrators of R installations to override the compilation and instead use the system HDF5 via pkg-config, for those who are confident that they have the appropriate system libaries.

We introduce a PKG_CPP_FLAGS option in pkgconfig(), to be used instead of relying on LinkingTo's automatic -I flags. This ensures that administrator-level overrides are respected in Rhdf5lib's downstream packages.

Additional environment variables are also introduced to override specific aspects of pkgconfig(), e.g., someone could set RHDF5_PKG_CXX_LIBS to point to a different set of HDF5 C++ libraries. Might be useful.

getHdf5Version() has been updated to just peek at the libhdf5.settings file to get the version, rather than creating a rather large binary (~13 MB) just to extract the version number. In the presence of an administrator override, the version is instead retrieved via pkg-config.

A side effect of this change is that, now that autoconf is no longer used, some of the advice in the vignette about configure.args won't be accurate. Not sure whether this can be accurately translated into the CMake world... or if it's even necessary to try to do so.

Passes CHECK on Linux. I no longer have a Mac to try on. Will try Windows later.

LTLA added 6 commits August 21, 2025 21:03
…rent arch.

In the context of the BBS, I'm not sure if this is problematic or not; are we
building binaries for anything other than ucrt on Modern Windows machines?
This cuts down on the build times as we don't have to build it ourselves.
@mr-c
Copy link

mr-c commented Aug 27, 2025

Hello @LTLA ; thank you for this PR. While testing it for Debian, I see that the expectation is that RHDF5LIB_USE_SYSTEM_LIBRARY is set to 1 while compiling and while using Rhdf5lib. My expectation is that setting RHDF5LIB_USE_SYSTEM_LIBRARY to 1 would be enough, and that any packager would not have to set RHDF5LIB_USE_SYSTEM_LIBRARY system wide for all users, which is not something we normally do.

Any thoughts on removing the need to set RHDF5LIB_USE_SYSTEM_LIBRARY?

Also, the tests fail for me on Debian with this PR:

----- FAILED[data]: test_library_version.R<3--3>
 call| expect_identical(libver, "1.14.6")
 diff| Expected '1.14.6', got '1.14.5'
----- FAILED[xcpt]: test_link_flags.R<12--15>
 call| expect_stdout(pkgconfig(opt = "PKG_C_LIBS"), pattern = "foobar -lhdf5")
 diff| output '-L/usr/lib/x86_64-linux-gnu/hdf5/serial -lhdf5 '
 diff| does not match pattern 'foobar -lhdf5'
----- FAILED[xcpt]: test_link_flags.R<17--20>
 call| expect_stdout(pkgconfig(opt = "PKG_C_LIBS"), pattern = "libhdf5.a")
 diff| output '-L/usr/lib/x86_64-linux-gnu/hdf5/serial -lhdf5 '
 diff| does not match pattern 'libhdf5.a'
----- FAILED[xcpt]: test_link_flags.R<23--26>
 call| expect_stdout(pkgconfig(opt = "PKG_CXX_LIBS"), pattern = "foobar -lhdf5_cpp -lhdf5")
 diff| output '-L/usr/lib/x86_64-linux-gnu/hdf5/serial -lhdf5  -lhdf5_cpp'
 diff| does not match pattern 'foobar -lhdf5_cpp -lhdf5'
----- FAILED[xcpt]: test_link_flags.R<28--31>
 call| expect_stdout(pkgconfig(opt = "PKG_CXX_LIBS"), pattern = "libhdf5_cpp.a")
 diff| output '-L/usr/lib/x86_64-linux-gnu/hdf5/serial -lhdf5  -lhdf5_cpp'
 diff| does not match pattern 'libhdf5_cpp.a'
----- FAILED[xcpt]: test_link_flags.R<34--37>
 call| expect_stdout(pkgconfig(opt = "PKG_C_HL_LIBS"), pattern = "foobar -lhdf5_hl -lhdf5")
 diff| output '-L/usr/lib/x86_64-linux-gnu/hdf5/serial -lhdf5  -lhdf5_hl'
 diff| does not match pattern 'foobar -lhdf5_hl -lhdf5'
----- FAILED[xcpt]: test_link_flags.R<39--42>
 call| expect_stdout(pkgconfig(opt = "PKG_C_HL_LIBS"), pattern = "libhdf5_hl.a")
 diff| output '-L/usr/lib/x86_64-linux-gnu/hdf5/serial -lhdf5  -lhdf5_hl'
 diff| does not match pattern 'libhdf5_hl.a'
----- FAILED[xcpt]: test_link_flags.R<45--48>
 call| expect_stdout(pkgconfig(opt = "PKG_CXX_HL_LIBS"), pattern = "foobar -lhdf5_hl_cpp -lhdf5_hl -lhdf5_cpp -lhdf5")
 diff| output '-L/usr/lib/x86_64-linux-gnu/hdf5/serial -lhdf5  -lhdf5_hl -lhdf5_cpp -lhdf5_hl_cpp'
 diff| does not match pattern 'foobar -lhdf5_hl_cpp -lhdf5_hl -lhdf5_cpp -lhdf5'
----- FAILED[xcpt]: test_link_flags.R<50--53>
 call| expect_stdout(pkgconfig(opt = "PKG_CXX_HL_LIBS"), pattern = "libhdf5_hl_cpp.a")
 diff| output '-L/usr/lib/x86_64-linux-gnu/hdf5/serial -lhdf5  -lhdf5_hl -lhdf5_cpp -lhdf5_hl_cpp'
 diff| does not match pattern 'libhdf5_hl_cpp.a'

@LTLA
Copy link
Author

LTLA commented Aug 27, 2025

You shouldn't have to set RHDF5LIB_USE_SYSTEM_LIBRARY at all. This is purely an option to allow R installation administrators to use the HDF5 system libraries, to avoid unnecessary re-compilation of the vendored source code.

It seems that in your case, setting it would be inappropriate, at least for the purposes of package testing - your HDF5 system library is 1.14.5 while the expected version in the tests is 1.14.6.

@LTLA
Copy link
Author

LTLA commented Sep 11, 2025

Does anyone know @grimbough's ETA until he's back?

@mr-c
Copy link

mr-c commented Sep 12, 2025

You shouldn't have to set RHDF5LIB_USE_SYSTEM_LIBRARY at all. This is purely an option to allow R installation administrators to use the HDF5 system libraries, to avoid unnecessary re-compilation of the vendored source code.

It seems that in your case, setting it would be inappropriate, at least for the purposes of package testing - your HDF5 system library is 1.14.5 while the expected version in the tests is 1.14.6.

I tested again without setting at build-time RHDF5LIB_USE_SYSTEM_LIBRARY nor at test-time, and a post installation test fails:

> if ( requireNamespace("tinytest", quietly=TRUE) ) {
+   tinytest::test_package("Rhdf5lib")
+ }
test_library_version.R........    1 tests 1 fails Error in eval(expr, envir = e) : object 'libver' not found
Calls: <Anonymous> ... lapply -> FUN -> eval -> eval -> expect_identical -> fun
Execution halted

@LTLA
Copy link
Author

LTLA commented Sep 12, 2025

CHECK passes fine for me.

I don't even have a libver variable in the package code. AFAICT that's only in inst/tinytest/test_library_version.R. If I had to guess, your local copy of that file has been modified.

@mr-c
Copy link

mr-c commented Sep 13, 2025

CHECK passes fine for me.

I don't even have a libver variable in the package code. AFAICT that's only in inst/tinytest/test_library_version.R. If I had to guess, your local copy of that file has been modified.

The only patch I'm testing with is this PR.

As a reminder, I doing this from a Debian packaging perspective; not that of a system administrator or user.

@LTLA
Copy link
Author

LTLA commented Sep 13, 2025

Not sure what to tell you, works fine for me:

if ( requireNamespace("tinytest", quietly=TRUE) ) {
   tinytest::test_package("Rhdf5lib")
}
## test_library_version.R........    2 tests OK 30ms
## test_link_flags.R.............    8 tests OK 0.2s
## All ok, 10 results (0.2s)

You might want to run test_library_version.R manually and see what's going wrong in your system; the error message suggests that your copy of the file has been modified. My version has:

expect_silent(libver <- Rhdf5lib::getHdf5Version())
expect_identical(libver, "1.14.6")

As you can see, libver is defined.

@LTLA
Copy link
Author

LTLA commented Nov 8, 2025

Alright, now that 3.22 has hit the ground, we should probably get this ball rolling again.

Note that in the README for https://github.com/HDFGroup/hdf5, they have switched to Cmake-only builds for the HDF5 library, so sooner or later you will need to make a similar switch in Rhdf5lib.

@Bisaloo
Copy link
Member

Bisaloo commented Nov 9, 2025

Yes, I agree it would be good to get this for 3.23. I still need to work out the specific timeline as I have other high-priority items that need to be addressed as well.

Am I correct the urgency has somewhat waned for you because of kanaverse/scran.js#98 (comment) (i.e., it doesn't fix the performance problems you are seeing), @LTLA?

My first reaction to this is that I would like to decouple the different issues raised in Huber-group-EMBL/rhdf5#147 and tackle them sequentially rather than in a single PR:

  1. switch to cmake
  2. switch to 1.14
  3. add a setting to rely on

How does this sound?

@LTLA
Copy link
Author

LTLA commented Nov 10, 2025

Am I correct the urgency has somewhat waned for you because of kanaverse/scran.js#98 (comment) (i.e., it doesn't fix the performance problems you are seeing), @LTLA?

That's correct. However, it would still be best to get it in sooner rather than later as I'm waiting for this update to be done before making some PRs to rhdf5 for improved support of compound data types.

How does this sound?

It's up to you, though I think that we can afford to be more adventurous given that we're at the very start of the devel cycle.

From the perspective of a downstream user/developer, there's likely to be breaking changes from both the switch to CMake and the update to the HDF5 library. So might as well just do both changes at once and get all the breaking out of the way.

You'll also have to check if the CMake options changed between HDF5 1.10 and 1.14. I don't think they did, but you never know. IMO it's just easier to jump straight to 1.14, given that no one's really asking for CMake specifically for 1.10.

@LTLA LTLA changed the title Update to 1.14.6 with a switch to cmake for the build. Update to 2.0.0 with a switch to cmake for the build. Dec 7, 2025
@LTLA
Copy link
Author

LTLA commented Dec 7, 2025

Just updated this PR to vendor in 2.0.0. Some comments:

  • Instead of vendoring a tarball, we unpack it and commit the individual files themselves. This should reduce the size of future diffs and it skips around the build system's 5 MB limit on each file.
  • Back-compatibility of 2.0.0 is pretty good. rhdf5 compiles without problems. I've done this upgrade already for my scran.js package, and there were only some minor changes with ordering of attributes IIRC.
  • 2.0.0 now uses the aws-c-s3 library and the associated stack to communicate with S3. This is an absolute chore to compile so I just disabled that capability for now. This is a good example of some relatively niche functionality that we don't need to support by default; if people need it, they can install hdf5 on their system with all the bells and whistles and set export RHDF5LIB_USE_SYSTEM_LIBRARY=1 to link to the system instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants