Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set smarter default for chunks[1] in zarr output file #1631

Open
erikvansebille opened this issue Jul 31, 2024 · 0 comments
Open

Set smarter default for chunks[1] in zarr output file #1631

erikvansebille opened this issue Jul 31, 2024 · 0 comments
Labels

Comments

@erikvansebille
Copy link
Member

The current default for the Parcels output chunks is (len(pset), 1), meaning that every observation in the output file will create a new chunk (with significant overhead delay); see also the note on output chunking in the documentation.

The reason we chose this is that it avoids extra NaNs in the 'observation' dimension in the output file. For example, if chunks[1]=5 and the outputfile has 7 observations, then the last 3 observations will be set to NaN. This may confuse users in their analysis.

However, we may be able to make a smarter choice for default of chunks[1] than 1. For example, a default based on the ratio between runtime and outputdt would make more sense, since for normal/simple simulations the number of expected observations is the floor(?) of runtime/outputdt.

Especially for long simulations with lots of outputs, the speedup could be massive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Backlog
Development

No branches or pull requests

1 participant