POC goals
Model geometry.Model an arbitrary custom constraint. (Geometry type)Model feature type.Override feature type JSON Schema to make it GeoJSON.Generate a typed ID reference.Model common names with language tags as dictionary keys.Model sourceswith JSON Pointer value.- Annotate
inttype with a "hard range" annotation for Parquet. - Figure out how this would propagate through Spark into Parquet file.
Generate a Spark schema.
Some observations of base types needed:
- A non-empty string type that has no leading or trailing whitespace.
- A floating-point number [0,1] representing a percentage, used for both confidence and linear referencing.
- A
UniqueItemsconstraint may not be needed because in Pydantic you can just declare the field type to beset/Setand this will generate the rightuniqeItemsconstraint in JSON Schema.
CODE GENERATION:
- Python has a native AST manipulation module,
ast. This would be good for parsing the schema source code into a Python AST. Either theastmodule by itself or that plusastorcould be used to traverse the schema source code as a syntax tree and generate other code from it.- It might be useful to use LibCST to generate the code because you can make a "concrete syntax tree" that contains comments etc.
- It appears that the Pydantic artifact that we REALLY care about in terms
of an AST-like structure is the core schema. So we probably don't want to
parse the code, we just want to find a way to:
- Discover all the things that derive from BaseModel (or Feature?), either by parsing the code or by requiring some kind of schema manifest. But note that even with a manifest we still have to parse the code.
- From all the BaseModel and/or Feature derived classes, get their
core schema from
__get_pydantic_core_schema__.