Skip to content

Feature/metadata improvements #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

Thomas-Z
Copy link
Collaborator

@Thomas-Z Thomas-Z commented Mar 2, 2025

These changes are related to the addition of dimension information in the zcollection metadata.
The motivations behind these changes are as follows:

Prevent the creation of unreadable zcollections when adding a new partition with dimensions different from the original one.
Improve validation of inserted data.
Clean up some code that templated existing variables to obtain dimension sizes.
Enable new features made possible by this additional metadata.

These changes will require existing collections to be updated before new data can be written.

The following elements are implemented in this initial commit:

  • Additional information in the collection metadata file:
    • zcollection version number
    • Known dimension sizes and chunking
  • Data validation on insert:
    • Checks on dimension properties
    • Checks on variable data types and fill values
  • Support for dropping immutable variables
  • Addition of a Collection.add_dimension method
  • Update of the update_deprecated_collection function
  • Handling of non-updated read-only collections

The following tasks remain to be done:

  • Complete the list of changes
  • Add tests:
    • For Collection.add_dimension
    • Reading subsets of view variables (only reference, only view, mix)
    • Adding immutable variables
    • Removing immutable variables
  • Restrict the ability to update an immutable variable
  • Restrict the ability to add immutable variables to a view
  • Finalize and fully test the deprecated collection and view update functions

@Thomas-Z Thomas-Z added the enhancement New feature or request label Mar 2, 2025
@Thomas-Z Thomas-Z requested a review from fbriol March 2, 2025 17:19
@Thomas-Z Thomas-Z self-assigned this Mar 2, 2025
@Thomas-Z Thomas-Z changed the base branch from main to develop March 2, 2025 17:20
@Thomas-Z Thomas-Z linked an issue Mar 2, 2025 that may be closed by this pull request
Thomas Zilio added 9 commits March 9, 2025 17:10
Adding tests
Refactoring and cleaning.
Making collection only containing immutable variables readable.
Making view readable even if they do not have any declared variable (just reading the reference).
Consolidating tests and refactoring.
Adding tests related to immutable variables presence in the dataset provided to update and map methods.
Fixing drop_partitions tests (with timedelta)
Normalizing zcol/zview naming in tests.
…ate() methods.

Adding the "filler" variable concept to differentiate variables added by the system and the one added by the user during insertion.
Fixing partition order output of _normalize_partitions
Removing "over typing".
… the default value of delayed (False if distributed is False, True otherwise).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ValueError when loading a 'bad' set of variables from a view
1 participant