Skip to content

User Guide: practical help for chunking and dealing with large(ish) data #438

Open
@Manangka

Description

@Manangka

In GitLab by @Huite on Jun 16, 2023, 16:59

Talking to Janneke just now, a script was taking an unnecessarily large amount of memory to run. This would be a nice case to demonstrate in a user guide piece of documentation.

In particular, the story here is that its best to move to the smallest amount of data, as soon as possible:

  • The upper active cells were determined on all times rather than on the first or last timestep
  • This can be used to reduce the number of layers before computing a mean (or other reduction) in time

We can show the different approach and illustrate (e.g. through memory usage, runtime). E.g. compute the mean first, then taking the upper active layer, show what happen if you chunk over time, or not; what bad chunking decisions do, etc.

This unidata post is also a nice resource:
https://www.unidata.ucar.edu/blogs/developer/entry/chunking_data_why_it_matters

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions