Skip to content

Introduce a flat option to ensure_contiguous_ndarray to switch off flatten for ZFPY codec #307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Mar 2, 2022

Conversation

halehawk
Copy link
Contributor

@halehawk halehawk commented Feb 11, 2022

closes #303

TODO:

  • Unit tests and/or doctests in docstrings
  • [ x] tox -e py39 passes locally
  • Docstrings and API docs for any new/modified user-facing classes and functions
  • [ x] Changes documented in docs/release.rst
  • [x ] tox -e docs passes locally
  • GitHub Actions CI passes
  • Test coverage to 100% (Coveralls passes)

@pep8speaks
Copy link

pep8speaks commented Feb 11, 2022

Hello @halehawk! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-03-01 20:52:40 UTC

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start @halehawk!

Instead of printing and exiting, you want to raise an error, as suggested below.

You will also need to test for this error in test_zfpy.py using with pytest.raises.

buf = ensure_contiguous_ndarray(buf)
# not flatten c-order array and raise exception for f-order array
if not isinstance(buf, np.ndarray):
raise TypeError("The zfp codec does not support none numpy arrays."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other type would you expect here? I thought all Zarr array data would come into numcodecs as numpy arrays?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to repeat my answers here, I added test_err_encode_list, which caused buf.flags not raise a correct error. So I added check the instance first. Do you have any different recommendation?

@halehawk
Copy link
Contributor Author

halehawk commented Feb 18, 2022 via email

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this! LGTM!

@rabernat
Copy link
Contributor

One final question I have (which doesn't have to hold up this PR) is whether you have tested this branch on a real-world use case and verified that it leads to much better compression, as expected from the discussion in #303.

@halehawk
Copy link
Contributor Author

@rabernat I checked both numcodecs 0.9.1 which has flatten array and my modified numcodecs. The non-flatten array definitely can provide us better compression ratio, for example using precison mode 16 bits, we can get from 2.28 to 4.32 on a 4D array, though the error metric might be not that good, such as: rmse from 0.003979 to 0.05241.

@halehawk
Copy link
Contributor Author

Looks like Peter's group added some meta data about macos-python3.10-zfpy build, but didn't upload the package yet. So I got download problem with this zfpy package.

@halehawk
Copy link
Contributor Author

Looks like zfpy-0.5.5-cp310-macos can be pip install successfully now, this time the failed OSX CI/build (3.10) should be able to pass. How can I restart all PR checks?

@halehawk
Copy link
Contributor Author

Now my PR failed on macos-py310 check again, but the problem was happened during building blosc.cpython-310-darwin.so. @rabernat @jakirkham Do you have any advice on this? Can you ignore this failure and merge my PR now? Thanks!

@rabernat
Copy link
Contributor

rabernat commented Feb 25, 2022

I do not know what is causing the OSX py 3.10 failure.

In general we do not like to merge things with failing CI, but I would defer to John's judgement on this.

@jakirkham
Copy link
Member

Regarding CI. Fixing in PR ( #311 )

conda create -n env python==${{matrix.python-version}} wheel pip compilers
conda create -n env python=${{matrix.python-version}} wheel pip compilers 'clang>=12.0.1'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave this for that PR to sort out 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to me revert the modification?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@rabernat
Copy link
Contributor

LGTM, but i would appreciate another maintainer approval.

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. Had a small suggestion on parameter naming. Once done would be happy to merge 🙂

Thanks for all the work here @halehawk! 😄

@halehawk
Copy link
Contributor Author

halehawk commented Mar 1, 2022 via email

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this should fix CI

@jakirkham jakirkham merged commit f4196a4 into zarr-developers:master Mar 2, 2022
@jakirkham
Copy link
Member

Thanks @halehawk for the PR and everyone for the reviews! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data array is flattened in numcodecs which reduced the compression ratio that ZFP can provide on multi-dimension arrays
4 participants