Skip to content

Geometric transforms proposal #7486

@KumoLiu

Description

@KumoLiu

Design

Design goals

  • Geometry has first-class support
    • Users should be able to create models and pipelines that are purely geometry-based
    • Users should be able to create models and pipelines that are combinations of pixel data and geometry data
  • It should be easy for users to make hybrid workflows
    • In hybrid workflows, we should make it easy to update geometry based on transforms to pixel data
  • Minimal API changes
    • We should minimise changes to the API

Characteristics of geometry and pixel data

  • geometry data
    • points: positions in world space
      • may have some kind of vertex / edge descriptor with which to interpret the points
  • pixel data
    • pixel resolution: a mapping from pixel-space to world space
    • bounding box: the geometric bounds of the pixel data in world space

Pixel-space vs world-space

  • We define two spaces in which operations can be carried out
    • world space
      • change the object in world space. this can mean rotation, size, location, shearing, etc.
      • applies to both pixel data and geometry data
    • pixel space
      • a geometric description of a change to the way pixel data is sampled
      • has no effect on world space
      • applies only to pixel data

Stages of a mixed pixel / geometry pipeline

  1. Load data sources
    a. pixel data
    b. geometry data
  2. align pixel data with geometry data (depends on task)
  3. apply various transforms to aligned pixel and geometry data
    a. our transforms should always keep pixel and geometry data aligned, for any given sequence of spatial transforms applied to both

Spatial transform categories

Categories of spatial transform

  • agnostic: work the same way on pixel and geometry data
    • flip, zoom, etc.
  • image-specific: transforms that make sense only for raster data
    • resample, spacing, etc.
  • hybrid: transforms that must also take images into acccount
    • rotate, etc.

A closer look at hybrid transforms

rotate must perform slightly different operations on pixel and geometry data

  • the rotation itself in world space is the same for pixel and geometry data
  • if keep_size is false, the extents of pixel data bounds will change
    • this is a pixel space change

Transform API

The transform API has the following layers

dictionary transform -> array transform -> functional transform

Dictionary transforms

Dictionary transforms specific to images can refer to geometry by name rather than requiring to pass tensors in directly

class Spacingd(MapTransform, InvertibleTransform, LazyTransform):
    def __init__(
        self, keys, pixdim, diagonal, mode, padding_mode, align_corners, dtype, scale_extent,
        recompute_affine, min_pixdim, max_pixdim, ensure_same_shape, allow_missing_keys):

As such, there shouldn't need to be any changes to the API for dictionary transforms:

  • geometry tensors are referred to by name, as are pixel tensors
  • transforms that aren't image-specific can just process all transforms independent of each other
  • transforms that are image-specific can perform the operation on image tensors first
  • the world-space component of the transform can then be applied to the geometry tensors

Array transforms

Array transforms specific to images need to be modified so that geometry data can be updated. This can be done via additional operation parameters that take a tensor or tuple of tensors:

class Spacing(InvertibleTransform, LazyTransform):
    def __call__(
        self, data_array, mode, padding_mode, align_corners, dtype, scale_extent, output_spatial_shape, lazy,
        inputs_to_update # New
    ):

Functional transforms

def spacing(
    data_array, mode, padding_mode, align_corners, dtype, scale_extent, output_spatial_shape, lazy,
    inputs_to_update # New
):

Functional transforms that are specific to image data first calculate the pixel-space and world-space transform components to be applied to the image data. They then call a function that applies the appropriate transform to geometry data.
Note: the geometry data should only need one operation for applying data to it, ideally we should not need to write *_image and *_point functions for each of the operations

Implementation

1. Integration of 'kind' Property to MetaTensor:

Propose to incorporate 'kind' property in MetaTensor. The property 'kind' will enable efficient identification and appropriate handling of different data types. The value of 'kind' can be conveniently retrieved using data.kind.

2. Data Input/Output Enhancements:

Introduce LoadPoint and LoadPointd with properties refer and refer_key. These properties will ascertain if the loaded point corresponds to a certain coordinate system and subsequently facilitate retrieval of information such as affine information from the reference.
Usage Examples:
LoadPointd(key="point", refer_key="image") and LoadPointd(data=point, refer=image)
Subject for Discussion: What data formats should we aim to support?

3. Improvements to Transform API:

The core idea is to house the computational logic within the associated operator and register it to the transform. This modification will minimize changes to the transform API. To accommodate a new data type in MONAI, current user-facing API logic would remain unaltered. New operators will simply be added as required.
Example:

class Flip():
    def __init__(self) -> None:
        self.operators = [flip_image, flip_point]

    def __call__(self, data, *args: Any, **kwds: Any) -> Any:
        for _operator in self.operators:
            ret = _operator(data)
            if ret is not None:
                return ret
    
    def register():
        pass

def flip_image(data):
    if data.kind != "pixel":
        return None
    else:
        ...
        return data

def flip_point(data):
    if data.kind != "point":
        return None
    else:
        ...
        return data

4. User Experience Enhancements:

The user experience can be improved by making the data operations more intuitive and user-friendly.
Code example:

from monai.transform as mt

data = [
    "image": image_path,
    "point": point_path
]

trans = mt.Compose([
    mt.LoadImaged(keys="image"),
    mt.LoadPointd(keys="point", refer_key="image"),
    mt.Flipd(keys=["image", "point"]),
    mt.Rotated(keys=["image", "point"]),
])

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    No status

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions