Schema datasets #242

Oxid15 · 2024-05-15T20:25:12Z

This one changes how data validation is handled in cascade. It introduces schemas in datasets.

Now pipeline blocks can explicitly define input schemas as pydantic models (optional dependency for this feature). If schema is defined in new SchemaModifier then it will implicitly insert ValidationWrapper before itself in the pipeline. When using self._dataset inside its own __getitem__ it will wrap itself in the validator which will perform validation of the input.
For example we have a dataset of annotated images:

class AnnotImage(pydantic.BaseModel):
    image: List[List[List[float]]]
    segments: List[List[int]]
    bboxes: List[Tuple[int, int, int, int]]

Base class for the pipeline will be:

class ImagesDataset(SchemaModifier):
    in_schema = AnnotImage

Let's define a dummy modifier:

class IDoNothing(ImagesDataset):
    def __getitem__(self, idx):
        item = self._dataset[idx]
        return item

Let's say we have a source of images and
hope that it will maintain our schema. The following
is the regular way we use modifier, but under the hood
it will automatically check that the output of the
datasource is AnnotImage

ds = SomeImageDatasource()
ds = IDoNothing(ds)

Oxid15 added 14 commits April 9, 2024 21:45

Merge branch '0.14.0' into schema_datasets

84c1bd4

Add imports from dataset

4cda2a6

Refactor validation to integrate with schemas

9e4061d

Rm whitespaces

9d9dd82

Write draft schema dataset and modifier

44d8d4a

Small refine

4159c1b

Allow arbitrary types in schemas and test it

b137088

Raise deprecated warnings when trying to use old validators

2a86124

Remove manual raises and use builtin abstract methods in dataset

8bf298e

Remove schema dataset, only modifier left

e50b6b2

Test modifiers meta

9d49ac8

Fix broken imports

14d2c9f

Add simple schema tests

ecb03a8

Bump years

1094e21

Oxid15 self-assigned this May 15, 2024

Oxid15 added 2 commits June 11, 2024 23:26

Write docstring for modifier

cb7f6d6

Add copyright and pydantic version into an exception

886cad6

Oxid15 merged commit 1e73047 into 0.14.0 Jun 12, 2024

Oxid15 deleted the schema_datasets branch June 12, 2024 08:58

Oxid15 mentioned this pull request Aug 24, 2024

0.14.0 #259

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema datasets #242

Schema datasets #242

Oxid15 commented May 15, 2024

Schema datasets #242

Schema datasets #242

Conversation

Oxid15 commented May 15, 2024