Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add YAML schema to asdf-pydantic models and trees #19

Open
2 of 6 tasks
ketozhang opened this issue Jun 28, 2024 · 2 comments
Open
2 of 6 tasks

Add YAML schema to asdf-pydantic models and trees #19

ketozhang opened this issue Jun 28, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@ketozhang
Copy link
Owner

ketozhang commented Jun 28, 2024

Pydantic automatically supports JSON Schema Draft 2020-12 for all JSON-serializable types. Custom types that are not pydantic.BaseModel and must have additional parts to have a JSON schema. Pydantic docs outline various ways of doing this and not all methods require the object to be JSON serializable, just have a schema.

ASDF schemas, are JSON Schema Draft 4 (older), which has been converted to YAML as specified by their schema docs.

Implementation

If ASDF can support the newer JSON schema, then implementation is extremely simple. We leave it to the user to define schemas for their custom data type.

Issue: Adapt ASDF types (tagged objects) with existing schema

However, there are custom data types that do have ASDF schemas, but not immediately available to pydantic:

  • ASDF's standard core types like ndarray
  • ASDF's standard astronomy types like unit
  • Non-standard extension types not created with asdf-pydantic like Roman's WFI image

One may hack around this by directly using ASDF's YAML schema and associate it with a new Pydantic field type. In this example, I use ndarray.

import numpy as np
from typing_extensions import Annotated
from pydantic import BaseModel, PlainValidator, WithJsonSchema

schema: dict = ...  
# For example, use pyyaml to load schema from  http://stsci.edu/schemas/asdf/core/ndarray-1.1.0
# However... ASDF schemas are not JSON-compatible with Draft 2020-12.

MyNDArrayType = Annotated[
    np.ndarray,
    PlainValidator(np.asarray), 
    WithJsonSchema(schema),
]

class Model(BaseModel):
    arr: MyNDArrayType

print(Model.schema())

This hack could be a good implementation but forces users to migrate all np.ndarray to something we manage (e.g., asdf_pydantic.builtin.NDArrayType). This is unfavorable to scale to all various core and extension types of ASDF.

Issue: ASDF does not provider helpers for tree schemas

In ASDF, the tree itself also can have a schema.

asdf-pydantic also does not have many features for trees. Users make trees with our models:

tree: dict = {'data': MyAsdfPydanticModel(...)}
af = AsdfFile(tree)

However, our models themselves can also be trees

tree: dict = MyAsdfPydanticModel.model_dump()
af = AsdfFile(tree)

This itself, given the model has a JSON schema, can be enough to generate an ASDF schema. We might not need an ASDF helper

TODO:

  • Can ASDF validate against JSON Schema Draft 2020-12?
  • Issue: Adapt ASDF types (tagged objects) with existing schema
  • Issue: ASDF does not provider helpers for tree schemas
  • Replace YAML schema string manipulation with pydantic schema_generator
  • Add automatic registration of model tag definitions (asdf.extension.TagDefinitions)
  • Add automatic registration of model schemas to resource mapping
@ketozhang ketozhang added the enhancement New feature or request label Jun 28, 2024
@ketozhang ketozhang changed the title Add YAML schema to asdf-pydantic models. Add YAML schema to asdf-pydantic models and trees Jun 28, 2024
@ketozhang
Copy link
Owner Author

ketozhang commented Aug 26, 2024

No (see asdf-format/asdf#1793).

I was able to get asdf.schema.check_schema to validate against the example node and rectangle

@pytest.mark.usefixtures("asdf_extension")
def test_check_schema():
"""Tests the model schema is correct."""
schema = yaml.safe_load(AsdfNode.model_asdf_schema())
asdf.schema.check_schema(schema)

@pytest.mark.usefixtures("asdf_extension")
def test_check_schema():
"""Tests the model schema is correct."""
schema = yaml.safe_load(AsdfRectangle.model_asdf_schema())
asdf.schema.check_schema(schema)

ASDF seems to accept this schema (not sure if it's a valid JSONSchema draft 4):

---
type: object
anyOf:
- $ref: "#/definitions/AsdfNode"
definitions:
    AsdfNode:
        type: object
        properties:
        name:
            type: string
        child:
            anyOf:
            - $ref: "#/definitions/AsdfNode"
            - type: null

@ketozhang ketozhang pinned this issue Aug 26, 2024
@ketozhang
Copy link
Owner Author

  • Add automatic registration of model schemas to resource mapping

The current usage to have ASDF use schemas to validate tags involves two parts:

  1. Create a tag definition to associate a tag URI to schema URI
    tags = [AsdfNode.get_tag_definition()] # type: ignore
  2. Register the actual schema (dict or path) to resource mapping
    asdf_config.add_resource_mapping(
    {
    yaml.safe_load(AsdfNode.model_asdf_schema())[
    "id"
    ]: AsdfNode.model_asdf_schema()
    }
    )
    asdf_config.add_extension(TestExtension())

It's not a good experience as you need to take each model and fill out the converters, tags, then separately the resource map.

Currently we have a convenient method for the converters...

AsdfPydanticConverter.add_models(AsdfNode)
class TestExtension(Extension):
extension_uri = "asdf://asdf-pydantic/examples/extensions/test-1.0.0"
converters = [AsdfPydanticConverter()] # type: ignore

...where the users would also use the same converter for registering tags (i.e., AsdfPydanticConverter.tags, but its a list of string and not TagDefinition).

Something similar for the tags would be nice but I don't really like having two convenience methods...

AsdfPydanticConverter.add_models(AsdfNode)
AsdfPydanticTagRegistry.add(AsdfNode.get_tag_definition())

class TestExtension(Extension):
    extension_uri = "asdf://asdf-pydantic/examples/extensions/test-1.0.0"

    converters = [AsdfPydanticConverter()]
    tags = [*AsdfPydanticTagRegistry().tags]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant