Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: create roi selection #90

Open
wants to merge 55 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
f006d31
Add ROISelection and FunctionalLungspace
ajonkman Sep 1, 2023
9328a98
Unfinished GridSelection algorithm
ajonkman Sep 1, 2023
7ada94c
Fix unwanted redefinition of built-in variable
psomhorst Sep 4, 2023
cd21b8c
Update GridSelection.find_grid() function
psomhorst Sep 4, 2023
4987a62
Add tests for GridSelection
psomhorst Sep 4, 2023
26f6a91
Add function to GridSelection to get a layout of the returned matrices
psomhorst Sep 4, 2023
67c9c62
Add some scratchpad code to notebook
psomhorst Sep 4, 2023
9bce610
Update region boundary finding algorithm
psomhorst Sep 21, 2023
65dc08f
Improve error messages
psomhorst Sep 21, 2023
d6bf3e3
Expand tests for GridSelection
psomhorst Sep 21, 2023
d71da31
Convert GridSelection to a dataclass
psomhorst Sep 22, 2023
274a35e
Add module specific exceptions
psomhorst Sep 22, 2023
c5958cd
Add value checks in __post_init__
psomhorst Sep 22, 2023
d7f2f00
Move NotImplementedError to proper place
psomhorst Sep 22, 2023
0d0dd2f
Update get_region_boundaries method with warnings
psomhorst Sep 22, 2023
45b72d6
Implement shorter unpacking
psomhorst Sep 22, 2023
33ab9ae
Add tests for initialization, warnings and exceptions
psomhorst Sep 22, 2023
9b04994
Add tests for matrix_layout()
psomhorst Sep 22, 2023
27b78b5
Create convenience classes of most-used cases
psomhorst Sep 25, 2023
a8856e1
Add docstring to GridSelection class
psomhorst Sep 25, 2023
27df399
Remove v_split and h_split arguments from convenience classes' __init__
psomhorst Sep 26, 2023
c86321e
Add split pixels functionality
psomhorst Sep 26, 2023
4259986
Extend, update and improve documentation
psomhorst Sep 26, 2023
bed66d5
Unify split pixels an non-split pixels function to reduce code reuse
psomhorst Sep 29, 2023
cfa97ce
Update tests to reflect split pixels and unification
psomhorst Sep 29, 2023
bb65617
Create module-specific warnings for catching in a GUI
psomhorst Oct 10, 2023
37b3c9b
Update documentation, removing True/False values
psomhorst Oct 10, 2023
451880e
Add warnings for when there are more groups than vectors
psomhorst Oct 10, 2023
6a4aa95
Allow orientation specific splitting
psomhorst Oct 10, 2023
7a7d0d5
Add tests for when split_pixels is True
psomhorst Oct 10, 2023
ffce6d1
Update tests, removing True/False values
psomhorst Oct 10, 2023
4965ba2
Update tests for new warning types
psomhorst Oct 11, 2023
972b706
Remove `split_pixels` default value from convenience classes
psomhorst Oct 11, 2023
695279a
Add comment about floating point imprecision
psomhorst Oct 10, 2023
ea6fa2d
Remove `split_pixels` argument altogether
psomhorst Oct 10, 2023
1434f7c
Rename split_cols to split_columns
psomhorst Oct 10, 2023
a713a85
Add default split arguments to convenience classes
psomhorst Oct 10, 2023
b8faa91
Re-add checks for split arguments
psomhorst Oct 10, 2023
6e86046
Add option to ignore or include nan rows/columns
psomhorst Oct 10, 2023
b33bc86
Convert 2D ndarray to list of 1D arrays
psomhorst Oct 10, 2023
4913ae5
Make split/no-split methods more alike
psomhorst Oct 10, 2023
170ceea
Simplify matrix creation
psomhorst Oct 10, 2023
ba0137f
Fix swapped split_rows / split_columns
psomhorst Oct 11, 2023
7aaefce
Reduce code repetition
psomhorst Oct 11, 2023
337a9e3
Add tests for initialization with split_rows/columns and ignore_nan_r…
psomhorst Oct 11, 2023
0681d55
Add tests for exceptions with(out) split_rows/columns
psomhorst Oct 11, 2023
a191c6c
Improve documentation
psomhorst Oct 11, 2023
625d481
Linting
psomhorst Oct 11, 2023
737d111
Add test for split pixels and NaN values, improve test documentation.
psomhorst Oct 27, 2023
c0a66eb
Improve type hints for numpy arrays.
psomhorst Oct 27, 2023
8369ce5
Use np.outer for row/column vector multiplication.
psomhorst Oct 27, 2023
b4d6c5c
Use np.newaxis where appropriate.
psomhorst Oct 27, 2023
a712e69
Use dictionary for choosing method, keeping it DRY.
psomhorst Oct 27, 2023
ad1a09b
Remove superfluous f-string indicators.
psomhorst Oct 27, 2023
af172d7
Remove unfinished FunctionalLungSpace selection class
psomhorst Oct 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions eitprocessing/roi_selection/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
class ROISelection:
def minimal_cluster(
self,
pixel_map,
):
# TODO create minimal cluster size selector
raise NotImplementedError()
390 changes: 390 additions & 0 deletions eitprocessing/roi_selection/gridselection.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,390 @@
import bisect
import itertools
import warnings
from dataclasses import dataclass
from dataclasses import field
from typing import Literal
import numpy as np
from numpy.typing import NDArray
from . import ROISelection


@dataclass
class GridSelection(ROISelection):
"""Create regions of interest by division into a grid.

GridSelection allows for the creation a list of 2D arrays that can be used to divide a two- or
higher-dimensional array into several regions structured in a grid. An instance of
GridSelection contains information about how to subdivide an input matrix. Calling
`find_grid(data)`, where data is a 2D array, results in a list of arrays with the same
dimension as `data`, each representing a single region. Each resulting 2D array contains the
value 0 for pixels that do not belong to the region, and the value 1 or any number between 0
and 1 for pixels that (partly) belong to the region.
Comment on lines +16 to +22
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of this is a rather generic explanation of what an ROI is/how we define it and handle it. Should that part of this docstring perhaps move to the docstring of the parent class?

also, see typo below:

Suggested change
GridSelection allows for the creation a list of 2D arrays that can be used to divide a two- or
higher-dimensional array into several regions structured in a grid. An instance of
GridSelection contains information about how to subdivide an input matrix. Calling
`find_grid(data)`, where data is a 2D array, results in a list of arrays with the same
dimension as `data`, each representing a single region. Each resulting 2D array contains the
value 0 for pixels that do not belong to the region, and the value 1 or any number between 0
and 1 for pixels that (partly) belong to the region.
GridSelection allows for the creation of a list of 2D arrays that can be used to divide a two- or
higher-dimensional array into several regions structured in a grid. An instance of
GridSelection contains information about how to subdivide an input matrix. Calling
`find_grid(data)`, where data is a 2D array, results in a list of arrays with the same
dimension as `data`, each representing a single region. Each resulting 2D array contains the
value 0 for pixels that do not belong to the region, and the value 1 or any number between 0
and 1 for pixels that (partly) belong to the region.


Rows and columns at the edges of `data` that only contain NaN (not a number) values are
ignored. E.g. a (32, 32) array where the first and last two rows and first and last two columns
only contain NaN are split as if it is a (28, 28) array. The resulting arrays have the shape
(32, 32) with the same cells as the input data containing NaN values.

If the number of rows or columns can not split evenly, a row or column can be split among two
regions. This behaviour is controlled by `split_rows` and `split_columns`.

If `split_rows` is `False` (default), rows will not be split between two groups. A warning will
be shown stating regions don't contain equal numbers of rows. The regions towards the top will
be larger. E.g., when a (5, 2) array is split in two vertical regions, the first region will
contain the first three rows, and the second region the last two rows.

If `split_rows` is `True`, e.g. a (5, 2) array that is split in two vertical regions, the first
region will contain the first two rows and half of each pixel of the third row. The second
region contains half of each pixel in the third row, and the last two rows.

`split_columns` has the same effect on columns as `split_rows` has on rows.
Comment on lines +29 to +41
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be clearer to add this to "Args" or examples section of the docstring instead? Not sure if that is more or less clear, tbh.


Regions are ordered according to C indexing order. The `matrix_layout()` method provides a map
showing how the regions are ordered.
Comment on lines +43 to +44
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear what "C indexing order" is. Maybe refer to example below?
I also rephrased slightly to make it clearer that the method refers to the same thing.

Suggested change
Regions are ordered according to C indexing order. The `matrix_layout()` method provides a map
showing how the regions are ordered.
Regions are ordered according to C indexing order, and a map of this ordering can be produced
using the `matrix_layout()` method (see example below).


Common grids are pre-defined:
- VentralAndDorsal: vertically divided into ventral and dorsal;
- RightAndLeft: horizontally divided into anatomical right and left; NB: anatomical right is
the left side of the matrix;
- FourLayers: vertically divided into ventral, mid-ventral, mid-dorsal and dorsal;
- Quadrants: vertically and horizontally divided into four quadrants.

Args:
v_split: The number of vertical regions. Must be 1 or larger.
h_split: The number of horizontal regions. Must be 1 or larger.
split_rows: Allows rows to be split over two regions.
split_columns: Allows columns to be split over two regions.

Examples:
>>> pixel_map = array([[ 1, 2, 3],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>>> pixel_map = array([[ 1, 2, 3],
>>> pixel_map = np.array([[ 1, 2, 3],

[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15],
[16, 17, 18]])
>>> gs = GridSelection(3, 1, split_pixels=False)
>>> matrices = gs.find_grid(pixel_map)
>>> matrices[0] * pixel_map
array([[1, 2, 3],
[4, 5, 6],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])
>>> gs.matrix_layout()
array([[0],
[1],
[2]])
>>> gs2 = GridSelection(2, 2, split_pixels=True)
>>> matrices2 = gs.find_grid(pixel_map)
>>> gs2.matrix_layout()
array([[0, 1],
[2, 3]])
>>> matrices2[2]
array([[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[1. , 0.5, 0. ],
[1. , 0.5, 0. ],
[1. , 0.5, 0. ]])
Comment on lines +66 to +90
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few suggestions (all collapsed into one below), see also 4f78ebb:

  • Correct typo of split_pixels.
  • rename matrices -> rois
  • move matrix layout above the resulting map
  • add one more example to each gs, for added clarity.
Suggested change
>>> gs = GridSelection(3, 1, split_pixels=False)
>>> matrices = gs.find_grid(pixel_map)
>>> matrices[0] * pixel_map
array([[1, 2, 3],
[4, 5, 6],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])
>>> gs.matrix_layout()
array([[0],
[1],
[2]])
>>> gs2 = GridSelection(2, 2, split_pixels=True)
>>> matrices2 = gs.find_grid(pixel_map)
>>> gs2.matrix_layout()
array([[0, 1],
[2, 3]])
>>> matrices2[2]
array([[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[1. , 0.5, 0. ],
[1. , 0.5, 0. ],
[1. , 0.5, 0. ]])
>>> gs = GridSelection(3, 1, split_rows=False)
>>> rois = gs.find_grid(pixel_map)
>>> gs.matrix_layout()
array([[0],
[1],
[2]])
>>> rois[0] * pixel_map
array([[1, 2, 3],
[4, 5, 6],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])
>>> rois[1] * pixel_map
array([[0, 0, 0],
[0, 0, 0],
[7, 8, 9],
[10, 11, 12],
[0, 0, 0],
[0, 0, 0]])
>>> gs2 = GridSelection(2, 2, split_columns=True)
>>> rois2 = gs.find_grid(pixel_map)
>>> gs2.matrix_layout()
array([[0, 1],
[2, 3]])
>>> rois2[2]
array([[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[1. , 0.5, 0. ],
[1. , 0.5, 0. ],
[1. , 0.5, 0. ]])
>>> rois2[3]
array([[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0.5, 1. ],
[0. , 0.5, 1. ],
[0. , 0.5, 1. ]])

"""

v_split: int
h_split: int
split_rows: bool = False
split_columns: bool = False
ignore_nan_rows: bool = True
ignore_nan_columns: bool = True
Comment on lines +95 to +98
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to have a single argument for each of split and ignore, as I think that more often than not a user would want the same behavior for both.
We can set it so that it accepts either a bool or a 2-part tuple of bools (for horizontal and vertical).

Suggested change
split_rows: bool = False
split_columns: bool = False
ignore_nan_rows: bool = True
ignore_nan_columns: bool = True
split_pixels: bool | tuple[bool, bool] = False
ignore_nan_edges: bool | tuple[bool, bool] = True

Also, what would happen and what do we want to happen if the center row/column of an array is nan?


def _check_attribute_type(self, name, type_):
"""Checks whether an attribute is an instance of the given type."""
attr = getattr(self, name)
if not isinstance(attr, type_):
message = f"Invalid type for `{name}`."
message += f"Should be {type_}, not {type(attr)}."
raise TypeError(message)

def __post_init__(self):
self._check_attribute_type("v_split", int)
self._check_attribute_type("h_split", int)
Comment on lines +108 to +110
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we so strict on using the correct types beyond giving type hints?
I don't see the point in raising an error if someone passes 2.0 or something. This could happen if some external package uses floats by default for whatever reason.

If we do want to remain strict, see 62bc8d4 for a suggestion of how to simplify coding this.


if self.v_split < 1:
raise InvalidVerticalDivision("`v_split` can't be smaller than 1.")

if self.h_split < 1:
raise InvalidHorizontalDivision("`h_split` can't be smaller than 1.")

self._check_attribute_type("split_columns", bool)
self._check_attribute_type("split_rows", bool)
self._check_attribute_type("ignore_nan_columns", bool)
self._check_attribute_type("ignore_nan_rows", bool)

def find_grid(self, data: NDArray) -> list[NDArray]:
"""
Create 2D arrays to split a grid into regions.

Create 2D arrays to split the given data into regions. The number of 2D
arrays will equal the number regions, which is the multiplicaiton of
`v_split` and `h_split`.

Args:
data (NDArray): a 2D array containing any numeric or np.nan data.

Returns:
list[NDArray]: a list of `n` 2D arrays where `n` is `v_split *
h_split`.
"""
grouping_method = {
True: self._create_grouping_vector_split_pixels,
False: self._create_grouping_vector_no_split_pixels,
}

horizontal_grouping_vectors = grouping_method[self.split_columns](
data, horizontal=True, n_groups=self.h_split
)

vertical_grouping_vectors = grouping_method[self.split_rows](
data, horizontal=False, n_groups=self.v_split
)

matrices = []
for vertical, horizontal in itertools.product(
vertical_grouping_vectors, horizontal_grouping_vectors
):
matrix = np.outer(vertical, horizontal)
matrix[np.isnan(data)] = np.nan
matrices.append(matrix)

return matrices

def _create_grouping_vector_no_split_pixels( # pylint: disable=too-many-locals
self,
data: NDArray,
horizontal: bool,
n_groups: int,
) -> list[NDArray]:
"""Create a grouping vector to split vector into `n` groups not
allowing split elements."""

axis = 0 if horizontal else 1

if (horizontal and self.ignore_nan_columns) or (
not horizontal and self.ignore_nan_rows
):
is_numeric = ~np.isnan(data)
numeric_vector_indices = np.argwhere(is_numeric.sum(axis) > 0)
first_numeric_vector = numeric_vector_indices.min()
last_vector_numeric = numeric_vector_indices.max()
else:
first_numeric_vector = 0
last_vector_numeric = data.shape[1 - axis] - 1

n_vectors = last_vector_numeric - first_numeric_vector + 1

if n_groups > n_vectors:
if horizontal: # pylint: disable=no-else-raise
raise InvalidHorizontalDivision(
"The number horizontal regions is larger than the "
f"number of available columns ({n_vectors})."
)
else:
raise InvalidVerticalDivision(
"The number vertical regions is larger than the "
f"number of available rows ({n_vectors})."
)

n_vectors_per_region = n_vectors / n_groups

if n_vectors_per_region % 1 > 0:
if horizontal:
warnings.warn(
"The horizontal regions will not have an equal number of "
f"columns. {n_vectors} is not equally divisible by {n_groups}.",
UnevenHorizontalDivision,
)
else:
warnings.warn(
"The vertical regions will not have an equal number of "
f"columns. {n_vectors} is not equally divisible by {n_groups}.",
UnevenVerticalDivision,
)

region_boundaries = [
first_numeric_vector
+ bisect.bisect_left(np.arange(n_vectors) / n_vectors_per_region, c)
for c in range(n_groups + 1)
]

vectors = []
for start, end in itertools.pairwise(region_boundaries):
vector = np.ones(data.shape[1 - axis])
vector[:start] = 0.0
vector[end:] = 0.0
vectors.append(vector)

return vectors

def _create_grouping_vector_split_pixels( # pylint: disable=too-many-locals
self,
matrix: NDArray,
horizontal: bool,
n_groups: int,
) -> list[NDArray]:
"""Create a grouping vector to split vector into `n` groups allowing
split elements."""

axis = 0 if horizontal else 1

# create a vector that is nan if the entire column/row is nan, 1 otherwise
vector_is_nan = np.all(np.isnan(matrix), axis=axis)
vector = np.ones(vector_is_nan.shape)

if (horizontal and self.ignore_nan_columns) or (
not horizontal and self.ignore_nan_rows
):
vector[vector_is_nan] = np.nan

# remove non-numeric (nan) elements at vector ends
# nan elements between numeric elements are kept
numeric_element_indices = np.argwhere(~np.isnan(vector))
first_num_element = numeric_element_indices.min()
last_num_element = numeric_element_indices.max()
else:
first_num_element = 0
last_num_element = len(vector) - 1

n_elements = last_num_element - first_num_element + 1

group_size = n_elements / n_groups

if group_size < 1:
if horizontal:
warnings.warn(
f"The number horizontal regions ({n_groups}) is larger than the "
f"number of available columns ({n_elements}).",
MoreHorizontalGroupsThanColumns,
)
else:
warnings.warn(
f"The number vertical regions ({n_groups}) is larger than the "
f"number of available rows ({n_elements}).",
MoreVerticalGroupsThanRows,
)

# find the right boundaries (upper values) of each group
right_boundaries = (np.arange(n_groups) + 1) * group_size
right_boundaries = right_boundaries[:, np.newaxis] # converts to row vector

# each row in the base represents one group
base = np.tile(np.arange(n_elements), (n_groups, 1))

# if the element number is higher than the split, it does not belong in this group
element_contribution_to_group = right_boundaries - base
element_contribution_to_group[element_contribution_to_group < 0] = 0

# if the element to the right is a full group size, this element is ruled out
rule_out = element_contribution_to_group[:, 1:] >= group_size
element_contribution_to_group[:, :-1][rule_out] = 0

# elements have a maximum value of 1
element_contribution_to_group = np.fmin(element_contribution_to_group, 1)

# if this element is already represented in the previous group (row), subtract that
element_contribution_to_group[1:] -= element_contribution_to_group[:-1]
element_contribution_to_group[element_contribution_to_group < 0] = 0

# element_contribution_to_group only represents non-nan elements
# insert into final including non-nan elements
final = np.full((n_groups, len(vector)), np.nan)
final[
:, first_num_element : last_num_element + 1
] = element_contribution_to_group

# convert to list of vectors
final = [final[n, :] for n in range(final.shape[0])]

return final

def matrix_layout(self) -> NDArray:
"""Returns a 2D array showing the layout of the matrices returned by
`find_grid`."""
n_regions = self.v_split * self.h_split
return np.reshape(np.arange(n_regions), (self.v_split, self.h_split))
Comment on lines +309 to +313
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be nice to make this a property rather than a function. See 33905ee for suggestion



class InvalidDivision(Exception):
"""Raised when the data can't be divided into regions."""


class InvalidHorizontalDivision(InvalidDivision):
"""Raised when the data can't be divided into horizontal regions."""


class InvalidVerticalDivision(InvalidDivision):
"""Raised when the data can't be divided into vertical regions."""
Comment on lines +316 to +325
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the added value of these over a ValueError?

and what's the added value of having separate horizontal and vertical versions, over specifying this in the error message (same question for warnings below)?



class DivisionWarning(Warning):
pass


class UnevenDivision(DivisionWarning):
"""Warning for when a grid selection results in groups of uneven size."""


class UnevenHorizontalDivision(UnevenDivision):
"""Warning for when a grid selection results in horizontal groups of uneven size."""


class UnevenVerticalDivision(UnevenDivision):
"""Warning for when a grid selection results in vertical groups of uneven size."""


class MoreGroupsThanVectors(DivisionWarning):
"""Warning for when the groups outnumber the available vectors."""


class MoreVerticalGroupsThanRows(MoreGroupsThanVectors):
"""Warning for when the vertical groups outnumber the available rows."""


class MoreHorizontalGroupsThanColumns(MoreGroupsThanVectors):
"""Warning for when the horizontal groups outnumber the available rows."""


@dataclass
class VentralAndDorsal(GridSelection):
"""Split data into a ventral and dorsal region of interest."""

v_split: Literal[2] = field(default=2, init=False)
h_split: Literal[1] = field(default=1, init=False)
split_rows = True


@dataclass
class RightAndLeft(GridSelection):
"""Split data into a right and left region of interest."""

v_split: Literal[1] = field(default=1, init=False)
h_split: Literal[2] = field(default=2, init=False)
split_columns = False


@dataclass
class FourLayers(GridSelection):
"""Split data vertically into four layer regions of interest."""

v_split: Literal[4] = field(default=4, init=False)
h_split: Literal[1] = field(default=1, init=False)
split_rows = True


@dataclass
class Quadrants(GridSelection):
"""Split data into four quadrant regions of interest."""

v_split: Literal[2] = field(default=2, init=False)
h_split: Literal[2] = field(default=2, init=False)
split_columns = False
split_rows = True
Loading
Loading