Skip to content

Add YAML schema to asdf-pydantic models and trees #19

Open
@ketozhang

Description

Pydantic automatically supports JSON Schema Draft 2020-12 for all JSON-serializable types. Custom types that are not pydantic.BaseModel and must have additional parts to have a JSON schema. Pydantic docs outline various ways of doing this and not all methods require the object to be JSON serializable, just have a schema.

ASDF schemas, are JSON Schema Draft 4 (older), which has been converted to YAML as specified by their schema docs.

Implementation

If ASDF can support the newer JSON schema, then implementation is extremely simple. We leave it to the user to define schemas for their custom data type.

Issue: Adapt ASDF types (tagged objects) with existing schema

However, there are custom data types that do have ASDF schemas, but not immediately available to pydantic:

  • ASDF's standard core types like ndarray
  • ASDF's standard astronomy types like unit
  • Non-standard extension types not created with asdf-pydantic like Roman's WFI image

One may hack around this by directly using ASDF's YAML schema and associate it with a new Pydantic field type. In this example, I use ndarray.

import numpy as np
from typing_extensions import Annotated
from pydantic import BaseModel, PlainValidator, WithJsonSchema

schema: dict = ...  
# For example, use pyyaml to load schema from  http://stsci.edu/schemas/asdf/core/ndarray-1.1.0
# However... ASDF schemas are not JSON-compatible with Draft 2020-12.

MyNDArrayType = Annotated[
    np.ndarray,
    PlainValidator(np.asarray), 
    WithJsonSchema(schema),
]

class Model(BaseModel):
    arr: MyNDArrayType

print(Model.schema())

This hack could be a good implementation but forces users to migrate all np.ndarray to something we manage (e.g., asdf_pydantic.builtin.NDArrayType). This is unfavorable to scale to all various core and extension types of ASDF.

Issue: ASDF does not provider helpers for tree schemas

In ASDF, the tree itself also can have a schema.

asdf-pydantic also does not have many features for trees. Users make trees with our models:

tree: dict = {'data': MyAsdfPydanticModel(...)}
af = AsdfFile(tree)

However, our models themselves can also be trees

tree: dict = MyAsdfPydanticModel.model_dump()
af = AsdfFile(tree)

This itself, given the model has a JSON schema, can be enough to generate an ASDF schema. We might not need an ASDF helper

TODO:

  • Can ASDF validate against JSON Schema Draft 2020-12?
  • Issue: Adapt ASDF types (tagged objects) with existing schema
  • Issue: ASDF does not provider helpers for tree schemas
  • Replace YAML schema string manipulation with pydantic schema_generator
  • Add automatic registration of model tag definitions (asdf.extension.TagDefinitions)
  • Add automatic registration of model schemas to resource mapping

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions