Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding metadata schema to the code base itself #7409

Merged
merged 22 commits into from
Jul 25, 2024
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions docs/source/mb_specification.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,12 +63,13 @@ This file contains the metadata information relating to the model, including wha
* **monai_version**: version of MONAI the bundle was generated on, later versions expected to work.
* **pytorch_version**: version of Pytorch the bundle was generated on, later versions expected to work.
* **numpy_version**: version of Numpy the bundle was generated on, later versions expected to work.
* **optional_packages_version**: dictionary relating optional package names to their versions, these packages are not needed but are recommended to be installed with this stated minimum version.
* **required_packages_version**: dictionary relating required package names to their versions. These are packages in addition to the base requirements of MONAI which this bundle absolutely needs. For example, if the bundle must load Nifti files the Nibabel package will be required.
* **task**: plain-language description of what the model is meant to do.
* **description**: longer form plain-language description of what the model is, what it does, etc.
* **authors**: state author(s) of the model.
* **copyright**: state model copyright.
* **network_data_format**: defines the format, shape, and meaning of inputs and outputs to the model, contains keys "inputs" and "outputs" relating named inputs/outputs to their format specifiers (defined below).
* **network_data_format**: defines the format, shape, and meaning of inputs and outputs to the (primary) model, contains keys "inputs" and "outputs" relating named inputs/outputs to their format specifiers (defined below). There is also an optional "post_processed_outputs" key stating the format of "outputs" after postprocessing transforms are applied, this is used to describe the final output from the bundle if it varies from the raw network output. These keys can also relate to primitive values (number, string, boolean), instead of the tensor format specified below.
* **\*_data_format**: defines the format, shape, and meaning of inputs and outputs to additional models which are secondary to the main model. This contains the same sort of information as **network_data_format** which describes networks providing secondary functionality, eg. a localisation network used to identify ROI in an image for cropping before data is sent to the primary network of this bundle.
ericspod marked this conversation as resolved.
Show resolved Hide resolved

Tensor format specifiers are used to define input and output tensors and their meanings, and must be a dictionary containing at least these keys:

Expand All @@ -89,6 +90,8 @@ Optional keys:
* **data_source**: description of where training/validation can be sourced.
* **data_type**: type of source data used for training/validation.
* **references**: list of published referenced relating to the model.
* **supported_apps**: list of supported applications which use bundles, eg. 'monai-label' would be present if the bundle is compatible with MONAI Label applications.
* **required_packages_version**: dictionary relating required package names to their versions. These packages are needed to be installed with this stated version.
ericspod marked this conversation as resolved.
Show resolved Hide resolved

The format for tensors used as inputs and outputs can be used to specify semantic meaning of these values, and later is used by software handling bundles to determine how to process and interpret this data. There are various types of image data that MONAI is uses, and other data types such as point clouds, dictionary sequences, time signals, and others. The following list is provided as a set of supported definitions of what a tensor "format" is but is not exhaustive and users can provide their own which would be left up to the model users to interpret:

Expand Down Expand Up @@ -124,7 +127,7 @@ An example JSON metadata file:
"monai_version": "0.9.0",
"pytorch_version": "1.10.0",
"numpy_version": "1.21.2",
"optional_packages_version": {"nibabel": "3.2.1"},
"required_packages_version": {"nibabel": "3.2.1"},
"task": "Decathlon spleen segmentation",
"description": "A pre-trained model for volumetric (3D) segmentation of the spleen from CT image",
"authors": "MONAI team",
Expand Down
220 changes: 220 additions & 0 deletions monai/bundle/meta_schema.json
KumoLiu marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
ericspod marked this conversation as resolved.
Show resolved Hide resolved
"$defs": {
"tensor": {
"$comment": "Represents a tensor object argument/return value, can be MetaTensor.",
"type": "object",
"properties": {
"type": {
"type": "string"
},
"format": {
"type": "string"
},
"num_channels": {
"type": "integer"
},
"spatial_shape": {
"type": "array",
"items": {
"type": [
"string",
"integer"
]
}
},
"dtype": {
"type": "string"
},
"value_range": {
"type": "array",
"items": {
"type": "number"
}
},
"is_patch_data": {
"type": "boolean"
},
"channel_def": {
"type": "object"
}
},
"required": [
"type",
"format",
"num_channels",
"spatial_shape",
"dtype",
"value_range"
]
},
"argument": {
"$comment": "Acceptable input types for model forward pass, arbitrary Python objects not covered.",
"anyOf": [
{
"type": "number"
},
{
"type": "integer"
},
{
"type": "string"
},
{
"type": "boolean"
},
{
"$ref": "#/$defs/tensor"
}
]
},
"result": {
"$comment": "Return value from a model's forward pass, same as an argument for now.",
"$ref": "#/$defs/argument"
},
"network_io": {
"description": "Defines the format, shape, and meaning of inputs and outputs to the model.",
ericspod marked this conversation as resolved.
Show resolved Hide resolved
"type": "object",
"$comment": "Arguments/return values described by pattern property, order considered significant.",
"properties": {
"inputs": {
"type": "object",
"patternProperties": {
"^.+$": {
"$ref": "#/$defs/argument"
}
}
},
"outputs": {
"type": "object",
"patternProperties": {
"^.+$": {
"$ref": "#/$defs/result"
}
}
},
"post_processed_outputs": {
"$comment": "Return value format after post-processing, not needed if not changed.",
"type": "object",
"patternProperties": {
"^.+$": {
"$ref": "#/$defs/result"
}
}
}
},
"required": [
"inputs",
"outputs"
]
}
},
"type": "object",
"properties": {
"schema": {
"description": "URL of the schema file.",
"type": "string"
},
"version": {
"description": "Version number of the bundle.",
"type": "string"
},
"changelog": {
"description": "Dictionary relating previous version names to strings describing the version.",
"type": "object"
},
"monai_version": {
"description": "Version of MONAI the bundle was generated with.",
"type": "string"
},
"pytorch_version": {
"description": "Version of PyTorch the bundle was generated with.",
"type": "string"
},
"numpy_version": {
"description": "Version of NumPy the bundlewas generated with.",
"type": "string"
},
"required_packages_version": {
"description": "Dictionary relating required package names to their versions. The bundle requires these packages to operate which are additional to the base requirements for MONAI.",
"type": "object"
},
"task": {
"description": "Plain-language description of what the bundle is meant to do.",
"type": "string"
},
"description": {
"description": "Longer form description of what the bundle is, what it does, etc.",
"type": "string"
},
"authors": {
"description": "State author(s) of the bundle.",
"type": "string"
},
"copyright": {
"description": "State copyright of the bundle.",
"type": "string"
},
"data_source": {
"description": "Where to download or prepare the data used in this bundle.",
"type": "string"
},
"data_type": {
"description": "Type of the data, like: `dicom`, `nibabel`, etc.",
ericspod marked this conversation as resolved.
Show resolved Hide resolved
"type": "string"
},
"image_classes": {
"description": "Description for every class of the input tensors.",
"type": "string"
},
"label_classes": {
"description": "Description for every class of the input tensors if present.",
"type": "string"
},
"pred_classes": {
"description": "Description for every class of the output prediction(s).",
"type": "string"
},
"eval_metrics": {
"description": "Dictionary relating evaluation metrics to the achieved scores.",
"type": "object"
},
"intended_use": {
"description": "What the bundle is to be used for, ie. what task it accomplishes.",
"type": "string"
},
"references": {
"description": "List of published referenced relating to the bundle.",
"type": "array",
"items": {
"type": "string"
}
},
"supported_apps": {
"description": "List of supported applications, eg. 'monai-label'",
"type": "object"
},
"network_data_format": {
"$ref": "#/$defs/network_io"
}
},
"$comment": "This permits definitions for multiple networks with format <network-name>_data_format",
"patternProperties": {
"^[_a-zA-Z0-9]+_data_format$": {
"$ref": "#/$defs/network_io"
}
},
"required": [
"schema",
"version",
"monai_version",
"pytorch_version",
"numpy_version",
"required_packages_version",
"task",
"description",
"authors",
"copyright",
"network_data_format"
]
}
Loading