Geometric transforms proposal

# Design

## Design goals

 - Geometry has first-class support
    - Users should be able to create models and pipelines that are purely geometry-based
    - Users should be able to create models and pipelines that are combinations of pixel data and geometry data
 - It should be easy for users to make hybrid workflows
   - In hybrid workflows, we should make it easy to update geometry based on transforms to pixel data
 - Minimal API changes
   - We should minimise changes to the API


## Characteristics of geometry and pixel data

 - geometry data
   - **points**: positions in world space
     - may have some kind of vertex / edge descriptor with which to interpret the points
 - pixel data
   - **pixel resolution**: a mapping from pixel-space to world space
   - **bounding box**: the geometric bounds of the pixel data in world space

## Pixel-space vs world-space

  - We define two spaces in which operations can be carried out
    - **world space**
      - change the object in world space. this can mean rotation, size, location, shearing, etc.
      - applies to both pixel data and geometry data
    - **pixel space**   
      - a geometric description of a change to the way pixel data is sampled
      - has no effect on world space
      - applies only to pixel data

## Stages of a mixed pixel / geometry pipeline

1. Load data sources
  a. pixel data
  b. geometry data
2. align pixel data with geometry data (depends on task)
3. apply various transforms to aligned pixel and geometry data
  a. our transforms should always keep pixel and geometry data aligned, for any given sequence of spatial transforms applied to both


## Spatial transform categories

Categories of spatial transform
 - agnostic: work the same way on pixel and geometry data
   - `flip`, `zoom`, etc.
 - image-specific: transforms that make sense only for raster data
   - `resample`, `spacing`, etc.
 - hybrid: transforms that must also take images into acccount
   - `rotate`, etc.

### A closer look at hybrid transforms

`rotate` must perform slightly different operations on pixel and geometry data
 - the rotation itself in *world space* is the same for pixel and geometry data
 - if `keep_size` is false, the extents of pixel data bounds will change
   - this is a *pixel space* change
 

## Transform API

The transform API has the following layers

`dictionary transform -> array transform -> functional transform`

### Dictionary transforms

Dictionary transforms specific to images can refer to geometry by name rather than requiring to pass tensors in directly

```python
class Spacingd(MapTransform, InvertibleTransform, LazyTransform):
    def __init__(
        self, keys, pixdim, diagonal, mode, padding_mode, align_corners, dtype, scale_extent,
        recompute_affine, min_pixdim, max_pixdim, ensure_same_shape, allow_missing_keys):
```

As such, there shouldn't need to be any changes to the API for dictionary transforms:
 - geometry tensors are referred to by name, as are pixel tensors
 - transforms that aren't image-specific can just process all transforms independent of each other
 - transforms that are image-specific can perform the operation on image tensors first
 - the *world-space* component of the transform can then be applied to the geometry tensors

### Array transforms

Array transforms specific to images need to be modified so that geometry data can be updated. This can be done via additional operation parameters that take a tensor or tuple of tensors:

```python
class Spacing(InvertibleTransform, LazyTransform):
    def __call__(
        self, data_array, mode, padding_mode, align_corners, dtype, scale_extent, output_spatial_shape, lazy,
        inputs_to_update # New
    ):
```

### Functional transforms

```python
def spacing(
    data_array, mode, padding_mode, align_corners, dtype, scale_extent, output_spatial_shape, lazy,
    inputs_to_update # New
):
```

Functional transforms that are specific to image data first calculate the pixel-space and world-space transform components to be applied to the image data. They then call a function that applies the appropriate transform to geometry data.
Note: the geometry data should only need one operation for applying data to it, ideally we should not need to write *_image and *_point functions for each of the operations

# Implementation

#### 1. Integration of 'kind' Property to MetaTensor:
Propose to incorporate 'kind' property in [MetaTensor](https://github.com/Project-MONAI/MONAI/blob/f4103c5fd9d8b511041ff0e20d9db5d6f03dd039/monai/data/meta_tensor.py#L51). The property 'kind' will enable efficient identification and appropriate handling of different data types. The value of 'kind' can be conveniently retrieved using data.kind.
#### 2. Data Input/Output Enhancements:
Introduce `LoadPoint` and `LoadPointd` with properties `refer` and `refer_key`. These properties will ascertain if the loaded point corresponds to a certain coordinate system and subsequently facilitate retrieval of information such as affine information from the reference.
Usage Examples:
`LoadPointd(key="point", refer_key="image")` and `LoadPointd(data=point, refer=image)`
_**Subject for Discussion:**_ What data formats should we aim to support?
#### 3. Improvements to Transform API:
The core idea is to house the computational logic within the associated operator and register it to the transform. This modification will minimize changes to the transform API. To accommodate a new data type in MONAI, current user-facing API logic would remain unaltered. New operators will simply be added as required.
Example:
```
class Flip():
    def __init__(self) -> None:
        self.operators = [flip_image, flip_point]

    def __call__(self, data, *args: Any, **kwds: Any) -> Any:
        for _operator in self.operators:
            ret = _operator(data)
            if ret is not None:
                return ret
    
    def register():
        pass

def flip_image(data):
    if data.kind != "pixel":
        return None
    else:
        ...
        return data

def flip_point(data):
    if data.kind != "point":
        return None
    else:
        ...
        return data

```
#### 4. User Experience Enhancements:
The user experience can be improved by making the data operations more intuitive and user-friendly.
Code example:
```
from monai.transform as mt

data = [
    "image": image_path,
    "point": point_path
]

trans = mt.Compose([
    mt.LoadImaged(keys="image"),
    mt.LoadPointd(keys="point", refer_key="image"),
    mt.Flipd(keys=["image", "point"]),
    mt.Rotated(keys=["image", "point"]),
])
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Geometric transforms proposal #7486

Design

Design goals

Characteristics of geometry and pixel data

Pixel-space vs world-space

Stages of a mixed pixel / geometry pipeline

Spatial transform categories

A closer look at hybrid transforms

Transform API

Dictionary transforms

Array transforms

Functional transforms

Implementation

1. Integration of 'kind' Property to MetaTensor:

2. Data Input/Output Enhancements:

3. Improvements to Transform API:

4. User Experience Enhancements:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Geometric transforms proposal #7486

Description

Design

Design goals

Characteristics of geometry and pixel data

Pixel-space vs world-space

Stages of a mixed pixel / geometry pipeline

Spatial transform categories

A closer look at hybrid transforms

Transform API

Dictionary transforms

Array transforms

Functional transforms

Implementation

1. Integration of 'kind' Property to MetaTensor:

2. Data Input/Output Enhancements:

3. Improvements to Transform API:

4. User Experience Enhancements:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions