Add documentation for Pyfive by valeriupredoi · Pull Request #81 · NCAS-CMS/pyfive

valeriupredoi · 2025-07-17T15:52:34Z

Description

We now have a fully working Readthedocs setup with a doc stub that builds well; it's time to add actual documentation. I have created the Install section so far, but we need more stuffs.

Checklist

This pull request has a descriptive title and labels
This pull request has a minimal description (most was discussed in the issue, but a two-liner description is still desirable)
Unit tests have been added (if codecov test fails)
Any changed dependencies have been added or removed correctly (if need be)
If you are working on the documentation, please ensure the current build passes
All tests pass

codecov · 2025-07-17T15:55:46Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.93%. Comparing base (7772b9b) to head (dd435af).
⚠️ Report is 244 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #81   +/-   ##
=======================================
  Coverage   71.93%   71.93%           
=======================================
  Files          11       11           
  Lines        2423     2423           
  Branches      364      364           
=======================================
  Hits         1743     1743           
  Misses        583      583           
  Partials       97       97

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

valeriupredoi · 2025-08-11T11:13:58Z

@davidhassell @bnlawrence as you folks saw on last Thu, we do have a few bits and bobs on docs for Pyfive (also, it's all nicely deployed to RTD), but this PR is here to add more - pls feel free to edit and contribute via this PR when you folks have a minute 🍺

kmuehlbauer

I've just had a look here. Thanks @valeriupredoi for these efforts in documentation.

doc/quickstart/installation.rst

Co-authored-by: Kai Mühlbauer <kmuehlbauer@wradlib.org>

…into add_documentation

…including showing the code works).

bnlawrence · 2025-08-13T15:44:30Z

@valeriupredoi Apart from the failing check (dunno why, it builds locally) I think this is close to a minimal set of useable documentation. It'd be good to check my parallel example actually works, but apart from that, I think we'd be good to go. Please have a good look at this now.

valeriupredoi · 2025-08-13T16:24:29Z

@valeriupredoi Apart from the failing check (dunno why, it builds locally) I think this is close to a minimal set of useable documentation. It'd be good to check my parallel example actually works, but apart from that, I think we'd be good to go. Please have a good look at this now.

mega! Very cool @bnlawrence - RTD is fussing about these type of warnings (a few of them):

pyfive.Group.attrs:1: WARNING: duplicate object description of pyfive.Group.attrs, other instance in api_reference, use :no-index: for one of them

I can fix those tomorrow 🍺

doc/quickstart/usage.rst

pyfive/h5d.py

…into add_documentation

valeriupredoi

some minor comments, but a bigger one includes fixing the S3 example 🍺

doc/introduction.rst

doc/optimising.rst

doc/quickstart/usage.rst

Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>

Yes, not sure the caching is understood by botocore, I had conflated a couple of different things, best to be explicit that we are providing params for s3fs itself. Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>

Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>

…ather than just the original docstring copied from h5py during implementation.

doc/quickstart/usage.rst

valeriupredoi · 2025-08-15T13:54:28Z

doc/optimising.rst

+            return data.min()
+
+        with ThreadPoolExecutor() as executor:
+            results = list(executor.map(get_min_of_variable, variable_names))


this works! But - I have not noticed any time improvements, it's very possible since I had to cut the size of the data because my RAM is not big enough, so what I was loading would be as fast, if not faster, in single thread process

As discussed, I'm slightly concerned by your RAM problems, it's probably ok to leave this (contrived) example in the docs as all our real examples are too complex to use as examplars, but we should do some due diligence on why this is happening.

this works, as I said - the issues I have with the buffer could well be because I am running out of both RAM and actual disk, so those may be very user-specific, that's why I said I need to look a lot closer at it

valeriupredoi · 2025-08-15T14:02:50Z

superb work @bnlawrence @kmuehlbauer @davidhassell 🍺 x 3

kmuehlbauer

Very neat, very nice docs. I've just went through it, added some suggestions.

kmuehlbauer · 2025-08-15T13:58:49Z

doc/optimising.rst

+The data storage complexities arise from two main factors: the use of chunking, and the way attributes are stored in the files
+


Suggested change

The data storage complexities arise from two main factors: the use of chunking, and the way attributes are stored in the files

The data storage complexities arise from two main factors: the use of chunking, and the way attributes are stored in the files.

kmuehlbauer · 2025-08-15T13:59:38Z

doc/optimising.rst

+-------------------------------
+
+Optimal access to data occurs when the data is chunked in a way that matches the access patterns of your application, and when the
+b-tree indexes and attributess are stored contiguously in the file.  


Suggested change

b-tree indexes and attributess are stored contiguously in the file.

b-tree indexes and attributes are stored contiguously in the file.

kmuehlbauer · 2025-08-15T14:00:30Z

doc/optimising.rst

+    import pyfive
+
+    with pyfive.File("data.h5", "r") as f:
+        variables = [f for var in f]


Suggested change

variables = [f for var in f]

variables = [var for var in f]

kmuehlbauer · 2025-08-15T14:01:57Z

doc/optimising.rst

+    print("Results:", results)
+
+
+You can do the same thing to parallelise manipuations within the variables, by for example using, ``Dask``, but that is beyond the scope of this document.


Suggested change

You can do the same thing to parallelise manipuations within the variables, by for example using, ``Dask``, but that is beyond the scope of this document.

You can do the same thing to parallelise manipulations within the variables, by for example using ``Dask``, but that is beyond the scope of this document.

kmuehlbauer · 2025-08-15T14:08:11Z

pyfive/h5d.py

            return self._index
    #### The following method can be used to set pseudo chunking size after the 
    #### file has been closed and before data transactions. This is pyfive specific
    def set_psuedo_chunk_size(self, newsize_MB):


Suggested change

def set_psuedo_chunk_size(self, newsize_MB):

def set_pseudo_chunk_size(self, newsize_MB):

There might be more occurences?

Sadly you're probably right, but we are hoping to do a wee internal hackathon in autumn, where we might deal with both enums and our test coverage ... so if they're not picked up before then, hopefully proper coverage tests will find the rest.

kmuehlbauer · 2025-08-15T14:12:18Z

Ah, missed by 7 minutes ;-) Please have a look at my comments, whether or not they are useful.

bnlawrence · 2025-08-15T14:30:18Z

@valeriupredoi Can we add these in please?

valeriupredoi · 2025-08-15T14:38:43Z

thanks @kmuehlbauer - I'll pop those into a new PR I'll open now 🍺

correct installation doc page

b0fc133

valeriupredoi requested review from bnlawrence and davidhassell July 17, 2025 15:52

valeriupredoi added the documentation label Jul 17, 2025

kmuehlbauer reviewed Aug 11, 2025

View reviewed changes

doc/quickstart/installation.rst Outdated Show resolved Hide resolved

valeriupredoi and others added 12 commits August 11, 2025 13:26

Update doc/quickstart/installation.rst

7095b1b

Co-authored-by: Kai Mühlbauer <kmuehlbauer@wradlib.org>

Improved intro, starting to scaffold usage and useful stuff

9234977

Working with datasets

8da5631

removing the original scaffolding

a2ed024

s3 access docs

569eb3b

Emphasizing that there are differences from h5py

1b58b2a

add api reference

e89c8a5

Merge remote-tracking branch 'refs/remotes/origin/add_documentation' …

effb4a9

…into add_documentation

Adding the h5d and additional API information

fbf5e16

Improving docstrings in h5d.py

c8e39ac

Some material for the optimising section. Absolutely needs checking (…

0139693

…including showing the code works).

Would have been smart to have run sphinx first

425732e

add no0index to api members

ea20d70

davidhassell reviewed Aug 13, 2025

View reviewed changes

doc/quickstart/usage.rst Show resolved Hide resolved

davidhassell reviewed Aug 13, 2025

View reviewed changes

pyfive/h5d.py Outdated Show resolved Hide resolved

Bryan Lawrence added 2 commits August 14, 2025 10:21

Adding details on S3 config to the optimising section

f06dbac

Merge remote-tracking branch 'refs/remotes/origin/add_documentation' …

933bffc

…into add_documentation

valeriupredoi commented Aug 14, 2025

View reviewed changes

bnlawrence and others added 3 commits August 15, 2025 09:29

Update doc/introduction.rst

72b0d68

Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>

Update doc/introduction.rst

658aa1e

Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>

Update doc/optimising.rst

a10833e

Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>

bnlawrence and others added 10 commits August 15, 2025 09:31

Update doc/optimising.rst

24f0eaa

Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>

Update doc/optimising.rst

db09a82

Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>

Update doc/optimising.rst

bfb4b8a

Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>

Update doc/quickstart/usage.rst

bec1fbd

Yes, not sure the caching is understood by botocore, I had conflated a couple of different things, best to be explicit that we are providing params for s3fs itself. Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>

Update doc/quickstart/usage.rst

e8a6c2d

Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>

Update doc/quickstart/usage.rst

45149cd

Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>

Update doc/quickstart/usage.rst

28c3a0c

Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>

Modified the DatasetID class docstring to better reflect what it is r…

1c1cdbc

…ather than just the original docstring copied from h5py during implementation.

:noindex:

bd9622a

add correct s3 example

dd435af

valeriupredoi commented Aug 15, 2025

View reviewed changes

valeriupredoi merged commit 9bbd8f6 into main Aug 15, 2025
6 checks passed

valeriupredoi deleted the add_documentation branch August 15, 2025 14:03

kmuehlbauer reviewed Aug 15, 2025

View reviewed changes

valeriupredoi mentioned this pull request Aug 15, 2025

Docs suggestions by Kai #84

Merged

4 tasks

		The data storage complexities arise from two main factors: the use of chunking, and the way attributes are stored in the files

	b-tree indexes and attributess are stored contiguously in the file.
	b-tree indexes and attributes are stored contiguously in the file.

		print("Results:", results)


		You can do the same thing to parallelise manipuations within the variables, by for example using, ``Dask``, but that is beyond the scope of this document.

	def set_psuedo_chunk_size(self, newsize_MB):
	def set_pseudo_chunk_size(self, newsize_MB):

Conversation

valeriupredoi commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

codecov bot commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

valeriupredoi commented Aug 11, 2025

Uh oh!

kmuehlbauer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bnlawrence commented Aug 13, 2025

Uh oh!

valeriupredoi commented Aug 13, 2025

Uh oh!

Uh oh!

Uh oh!

valeriupredoi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

valeriupredoi commented Aug 15, 2025

Uh oh!

Uh oh!

kmuehlbauer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kmuehlbauer commented Aug 15, 2025

Uh oh!

bnlawrence commented Aug 15, 2025

Uh oh!

valeriupredoi commented Aug 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

valeriupredoi commented Jul 17, 2025 •

edited

Loading

codecov bot commented Jul 17, 2025 •

edited

Loading