Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create tool that reduces file size for sharing and debugging #1364

Open
rly opened this issue Jun 1, 2021 · 1 comment
Open

Create tool that reduces file size for sharing and debugging #1364

rly opened this issue Jun 1, 2021 · 1 comment
Labels
category: enhancement improvements of code or code behavior help wanted: good first issue request for community contributions that are good for new contributors priority: low alternative solution already working and/or relevant to only specific user(s)

Comments

@rly
Copy link
Contributor

rly commented Jun 1, 2021

If there is an error with reading or validating an NWB/HDMF file and the file is large (>1 GB), sharing that file is burdensome, and rarely is the error due to the size of the datasets. It would be useful to create a simple tool to reduce the size of large (>1k elements) datasets and repack the new file to make it easier for sharing. Then when a user has a large file that cannot be read or validated, we can ask them to run this script on the file and share the resulting smaller file.

pip install nwbtrim 
nwbtrim myfile.py -o myfile_trim.py
@rly rly added the category: enhancement improvements of code or code behavior label Jun 1, 2021
@oruebel
Copy link
Contributor

oruebel commented Jun 1, 2021

I don't think the h5repack utility https://support.hdfgroup.org/HDF5/doc/RM/Tools.html#Tools-Repack can quite do this out-of-the-box, but I think it can still be useful here. I think a simple solution would be to:

  1. Copy the file
  2. Shrink all large datasets (e.g., using h5py.visititems with h5py.resize)
  3. Call h5repack

@oruebel oruebel added help wanted: good first issue request for community contributions that are good for new contributors priority: low alternative solution already working and/or relevant to only specific user(s) labels Sep 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: enhancement improvements of code or code behavior help wanted: good first issue request for community contributions that are good for new contributors priority: low alternative solution already working and/or relevant to only specific user(s)
Projects
None yet
Development

No branches or pull requests

2 participants