-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Design
Design goals
- Geometry has first-class support
- Users should be able to create models and pipelines that are purely geometry-based
- Users should be able to create models and pipelines that are combinations of pixel data and geometry data
- It should be easy for users to make hybrid workflows
- In hybrid workflows, we should make it easy to update geometry based on transforms to pixel data
- Minimal API changes
- We should minimise changes to the API
Characteristics of geometry and pixel data
- geometry data
- points: positions in world space
- may have some kind of vertex / edge descriptor with which to interpret the points
- points: positions in world space
- pixel data
- pixel resolution: a mapping from pixel-space to world space
- bounding box: the geometric bounds of the pixel data in world space
Pixel-space vs world-space
- We define two spaces in which operations can be carried out
- world space
- change the object in world space. this can mean rotation, size, location, shearing, etc.
- applies to both pixel data and geometry data
- pixel space
- a geometric description of a change to the way pixel data is sampled
- has no effect on world space
- applies only to pixel data
- world space
Stages of a mixed pixel / geometry pipeline
- Load data sources
a. pixel data
b. geometry data - align pixel data with geometry data (depends on task)
- apply various transforms to aligned pixel and geometry data
a. our transforms should always keep pixel and geometry data aligned, for any given sequence of spatial transforms applied to both
Spatial transform categories
Categories of spatial transform
- agnostic: work the same way on pixel and geometry data
flip,zoom, etc.
- image-specific: transforms that make sense only for raster data
resample,spacing, etc.
- hybrid: transforms that must also take images into acccount
rotate, etc.
A closer look at hybrid transforms
rotate must perform slightly different operations on pixel and geometry data
- the rotation itself in world space is the same for pixel and geometry data
- if
keep_sizeis false, the extents of pixel data bounds will change- this is a pixel space change
Transform API
The transform API has the following layers
dictionary transform -> array transform -> functional transform
Dictionary transforms
Dictionary transforms specific to images can refer to geometry by name rather than requiring to pass tensors in directly
class Spacingd(MapTransform, InvertibleTransform, LazyTransform):
def __init__(
self, keys, pixdim, diagonal, mode, padding_mode, align_corners, dtype, scale_extent,
recompute_affine, min_pixdim, max_pixdim, ensure_same_shape, allow_missing_keys):As such, there shouldn't need to be any changes to the API for dictionary transforms:
- geometry tensors are referred to by name, as are pixel tensors
- transforms that aren't image-specific can just process all transforms independent of each other
- transforms that are image-specific can perform the operation on image tensors first
- the world-space component of the transform can then be applied to the geometry tensors
Array transforms
Array transforms specific to images need to be modified so that geometry data can be updated. This can be done via additional operation parameters that take a tensor or tuple of tensors:
class Spacing(InvertibleTransform, LazyTransform):
def __call__(
self, data_array, mode, padding_mode, align_corners, dtype, scale_extent, output_spatial_shape, lazy,
inputs_to_update # New
):Functional transforms
def spacing(
data_array, mode, padding_mode, align_corners, dtype, scale_extent, output_spatial_shape, lazy,
inputs_to_update # New
):Functional transforms that are specific to image data first calculate the pixel-space and world-space transform components to be applied to the image data. They then call a function that applies the appropriate transform to geometry data.
Note: the geometry data should only need one operation for applying data to it, ideally we should not need to write *_image and *_point functions for each of the operations
Implementation
1. Integration of 'kind' Property to MetaTensor:
Propose to incorporate 'kind' property in MetaTensor. The property 'kind' will enable efficient identification and appropriate handling of different data types. The value of 'kind' can be conveniently retrieved using data.kind.
2. Data Input/Output Enhancements:
Introduce LoadPoint and LoadPointd with properties refer and refer_key. These properties will ascertain if the loaded point corresponds to a certain coordinate system and subsequently facilitate retrieval of information such as affine information from the reference.
Usage Examples:
LoadPointd(key="point", refer_key="image") and LoadPointd(data=point, refer=image)
Subject for Discussion: What data formats should we aim to support?
3. Improvements to Transform API:
The core idea is to house the computational logic within the associated operator and register it to the transform. This modification will minimize changes to the transform API. To accommodate a new data type in MONAI, current user-facing API logic would remain unaltered. New operators will simply be added as required.
Example:
class Flip():
def __init__(self) -> None:
self.operators = [flip_image, flip_point]
def __call__(self, data, *args: Any, **kwds: Any) -> Any:
for _operator in self.operators:
ret = _operator(data)
if ret is not None:
return ret
def register():
pass
def flip_image(data):
if data.kind != "pixel":
return None
else:
...
return data
def flip_point(data):
if data.kind != "point":
return None
else:
...
return data
4. User Experience Enhancements:
The user experience can be improved by making the data operations more intuitive and user-friendly.
Code example:
from monai.transform as mt
data = [
"image": image_path,
"point": point_path
]
trans = mt.Compose([
mt.LoadImaged(keys="image"),
mt.LoadPointd(keys="point", refer_key="image"),
mt.Flipd(keys=["image", "point"]),
mt.Rotated(keys=["image", "point"]),
])
Metadata
Metadata
Assignees
Labels
Type
Projects
Status