Skip to content

Metadata Refactoring #3331

Open
Open
@mohammad-alisafaee

Description

@mohammad-alisafaee

We have different database indexes for datasets, plans, and activities which aren't consistent. For example, plans is a map from id to all Plan objects (includes removed ones) but datasets is a map from name to active Dataset objects (excludes removed datasets). There are also differences in the difference Gateway APIs for each of these classes.

We should have a consistent set of indexes for each of these classes (whenever it makes sense):

  • datasets, plans, and activities should be maps from id to objects and include removed objects as well
  • datasets-by-name and plans-by-name are maps from name to active objects (i.e. non-removed)
  • Maybe having datasets-removed, plans-removed, and activities-removed to map id to all deleted objects (not just the tail object). We can remove them from the first indexes in this case.
  • datasets-tags includes tags only for active datasets. We should include removed datasets as well (needs to be discussed).
  • All Gateway should have consistent APIs
  • Discuss to unify DatasetGateway and DatasetsProvenance
  • .renku/metadata.yml can be deleted since we dropped support for <v1.0.0

Additional context

  • If we have these changes before deploying v10 metadata, we can skip v10 and deploy v11 directly.
  • We can use BTrees.check.check to validate indexes (in case users modified them). This function won't work with subclasses out-of-the-box. So, we either have to delete RenkuOOBTree and inherit directly from a BTree or make it work with subclasses.

Notes

  • Plan and Dataset objects can have a derivation chain; when removing a plan/dataset, we set the tail object as removed and don't modify others. We should consider this when filtering removed objects.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Ready

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions