|
1 | 1 | Custom Preprocessing Pipelines
|
2 | 2 | ==============================
|
3 | 3 |
|
4 |
| -``PathML`` comes with a set of pre-made pipelines ready to use out of the box. |
5 |
| -However, it may also be necessary in many cases to create custom preprocessing pipelines tailored to the specific |
6 |
| -application at hand. |
| 4 | +``PathML`` makes designing preprocessing pipelines easy. In this section we will walk through how to define a |
| 5 | +:class:`~pathml.preprocessing.pipeline.Pipeline` object by composing pre-made |
| 6 | +:class:`~pathml.preprocessing.transforms.Transform`s, and how to implement a |
| 7 | +new custom :class:`~pathml.preprocessing.transforms.Transform`. |
7 | 8 |
|
8 | 9 | Pipeline basics
|
9 | 10 | ---------------
|
10 | 11 |
|
11 |
| -Preprocessing pipelines are defined in objects that inherit from the BasePipeline abstract class. |
12 |
| -The preprocessing logic for a single slide is defined in the ``run_single()`` method. |
13 |
| -Then, when the ``run()`` method is called, the input is checked to see whether it is a single slide or a dataset of |
14 |
| -slides. The ``run_single()`` method is then called as appropriate, and multiprocessing is automatically handled in the |
15 |
| -case of processing an entire dataset. |
| 12 | +Preprocessing pipelines are defined in :class:`~pathml.preprocessing.pipeline.Pipeline` objects. |
| 13 | +When :meth:`~pathml.core.slide_data.SlideData.run` |
| 14 | +is called, tiles are lazily extracted from the slide by |
| 15 | +:meth:`~pathml.core.slide_data.SlideData.generate_tiles` and passed to the |
| 16 | +:class:`~pathml.preprocessing.pipeline.Pipeline`, which modifies the :class:`~pathml.core.tile.Tile` object in place. |
| 17 | +Finally, the processed tile is saved. |
| 18 | +This design facilitates preprocessing of gigapixel-scale whole-slide images, because :class:`~pathml.core.tile.Tile` |
| 19 | +objects are small enough to fit in memory. |
16 | 20 |
|
17 |
| -To define a new pipeline, all that is necessary is to define the ``run_single()`` method. |
18 |
| -The method should take a ``BaseSlide`` object as input (or a specific type of slide inheriting from the ``BaseSlide`` |
19 |
| -class), and should write the processed output to disk. Because the ``run()`` method is just a wrapper around the |
20 |
| -``run_single()`` method, there is no need to override the default ``run()``. |
| 21 | +Composing a Pipeline |
| 22 | +-------------------- |
21 | 23 |
|
22 |
| -A ``SlideData`` object can be used to hold intermediate outputs, so that a preprocessing step can have access to |
23 |
| -outputs from earlier steps. |
| 24 | +In many cases, a preprocessing pipeline can be thought of as a sequence of transformations. |
| 25 | +:class:`~pathml.preprocessing.pipeline.Pipeline` objects can be created by composing |
| 26 | +a list of :class:`~pathml.preprocessing.transforms.Transform`: |
24 | 27 |
|
25 |
| -Interacting with slides |
26 |
| ------------------------- |
| 28 | +.. code-block:: python |
27 | 29 |
|
28 |
| -Pipelines must take ``BaseSlide`` objects as input. |
29 |
| -This interaction between Pipelines and Slides is very important - design choices here can affect pipeline execution |
30 |
| -times by orders of magnitude! |
31 |
| -This is because whole-slide images can be very large, even exceeding the amount of available memory in most machines! |
| 30 | + pipeline = Pipeline([ |
| 31 | + BoxBlur(kernel_size=15), |
| 32 | + TissueDetectionHE(mask_name = "tissue", min_region_size=500, |
| 33 | + threshold=30, outer_contours_only=True) |
| 34 | + ]) |
| 35 | +.. |
32 | 36 |
|
33 |
| -.. note:: |
| 37 | +In this example, the preprocessing pipeline will first apply a box blur kernel, and then apply tissue detection. |
| 38 | +It is that easy to compose pipelines by mixing and matching :class:`~pathml.preprocessing.transforms.Transform` objects! |
34 | 39 |
|
35 |
| - Naively loading an entire WSI into memory at high-resolution should therefore be avoided in most cases! |
36 | 40 |
|
37 |
| -Consider these best-practices when designing custom pipelines: |
| 41 | +Custom Transforms |
| 42 | +----------------- |
38 | 43 |
|
39 |
| -- Make use of the ``BaseSlide.chunks()`` method to process the WSI in smaller chunks |
40 |
| -- Perform operations on lower-resolution image levels, when possible (i.e. when the slide has multiple resolutions |
41 |
| - available and the operation will not suffer from decreased resolution) |
42 |
| -- Be cognizant of memory requirements at each step in the pipeline |
43 |
| -- Avoid loading entire slides into memory at high-resolution! |
| 44 | +A :class:`~pathml.preprocessing.pipeline.Pipeline` is a special case of |
| 45 | +a :class:`~pathml.preprocessing.transforms.Transform` which makes it easy |
| 46 | +to compose several :class:`~pathml.preprocessing.transforms.Transform`s sequentially. |
| 47 | +However, in some cases, you may want to implement a :class:`~pathml.preprocessing.transforms.Transform` from scratch. |
| 48 | +For example, you may want to apply a transformation which is not already implemented in ``PathML``. |
| 49 | +Or, perhaps you want to apply a preprocessing pipeline which is not perfectly sequential. |
44 | 50 |
|
45 |
| -Using Transforms |
46 |
| -------------------- |
| 51 | +To define a new custom :class:`~pathml.preprocessing.transforms.Transform`, |
| 52 | +all you need to do is create a class which inherits from :class:`~pathml.preprocessing.transforms.Transform` and |
| 53 | +implements an ``apply()`` method which takes a :class:`~pathml.core.tile.Tile` as an argument and modifies it in place. |
| 54 | +You may also implement a functional method ``F()``, although that is not strictly required. |
47 | 55 |
|
48 |
| -``PathML`` provides a set of modular Transformation objects to make it easier to define custom preprocessing pipelines. |
49 |
| -Individual low-level operations are implemented in ``Transform`` objects, through the ``apply()`` method. |
50 |
| -This consistent API makes it convenient to use complex operations in pipelines, and combine them modularly. |
51 |
| -There are several types of Transforms, as defined by their inputs and outputs: |
| 56 | +For example, let's take a look at how :class:`~pathml.preprocessing.transforms.BoxBlur` is implemented: |
52 | 57 |
|
53 |
| -================== ========== =========== |
54 |
| -Transform type Input Output |
55 |
| -================== ========== =========== |
56 |
| -ImageTransform image image |
57 |
| -Segmentation image mask |
58 |
| -MaskTransform mask mask |
59 |
| -================== ========== =========== |
| 58 | +.. code-block:: python |
60 | 59 |
|
61 |
| -Some things to consider when implementing a custom pipeline: |
62 |
| - |
63 |
| -- Use existing Transforms when possible! This will save time compared to implementing the entire pipeline from scratch. |
64 |
| -- If implementing a new transformation or pipeline operation, consider contributing it to ``PathML`` so that other |
65 |
| - users in the community can benefit from your hard work! See: contributing |
66 |
| -- Be aware of memory and computation requirements of your pipeline. |
67 |
| - |
68 |
| - |
69 |
| -Examples |
70 |
| --------- |
71 |
| - |
72 |
| -In this example we'll define a Pipeline which reads chunks of the input slide, applies a box blur with a given kernel |
73 |
| -size, and then writes the blurred image to disk. |
74 |
| - |
75 |
| -.. code-block:: |
76 |
| -
|
77 |
| - import os |
78 |
| - import cv2 |
79 |
| - from pathml.preprocessing.base import BasePipeline |
80 |
| - from pathml.preprocessing.transforms import BoxBlur |
81 |
| - from pathml.preprocessing.wsi import HESlide |
82 |
| -
|
83 |
| - class ExamplePipeline(BasePipeline): |
84 |
| - def __init__(self, kernel_size): |
| 60 | + class BoxBlur(Transform): |
| 61 | + """Box (average) blur kernel.""" |
| 62 | + def __init__(self, kernel_size=5): |
85 | 63 | self.kernel_size = kernel_size
|
86 | 64 |
|
87 |
| - def run_single(self, slide, output_dir): |
88 |
| - blur = BoxBlur(kernel_size) |
89 |
| - for i, chunk in enumerate(slide.chunks(level = 0, size = 1000)): |
90 |
| - blurred_chunk = blur.apply(chunk) |
91 |
| - fname = os.path.join(output_dir, f"chunk{i}.jpg") |
92 |
| - cv2.imwrite(fname, blurred_chunk) |
93 |
| -
|
94 |
| - # usage |
95 |
| - wsi = HESlide("/path/to/wsi.svs") |
96 |
| - ExamplePipeline(kernel_size = 11).run(wsi) |
97 |
| -
|
98 |
| -
|
99 |
| -In this example, we define a Transform which changes the order of the channels in the input RGB image. |
100 |
| - |
101 |
| -.. code-block:: |
| 65 | + def F(self, image): |
| 66 | + return cv2.boxFilter(image, ksize = (self.kernel_size, self.kernel_size), ddepth = -1) |
102 | 67 |
|
103 |
| - from pathml.preprocessing.base import ImageTransform |
| 68 | + def apply(self, tile): |
| 69 | + tile.image = self.F(tile.image) |
| 70 | +.. |
104 | 71 |
|
105 |
| - class ChannelSwitch(ImageTransform): |
106 |
| - def apply(self, image): |
107 |
| - # make sure that the input image has 3 channels |
108 |
| - assert image.shape[2] == 3 |
109 |
| - out = image |
110 |
| - out[:, :, 0] = image[:, :, 2] |
111 |
| - out[:, :, 1] = image[:, :, 0] |
112 |
| - out[:, :, 2] = image[:, :, 1] |
113 |
| - return out |
| 72 | +That's it! Once you define your custom :class:`~pathml.preprocessing.transforms.Transform`, |
| 73 | +you can plug it in with any of the other :class:`~pathml.preprocessing.transforms.Transform`s, |
| 74 | +compose :class:`~pathml.preprocessing.pipeline.Pipeline`, etc. |
0 commit comments