Dana-Farber-AIOS
diff --git a/‎.coveragerc
Lines changed: 6 additions & 2 deletions b/‎.coveragerc
Lines changed: 6 additions & 2 deletions
diff --git a/‎.github/workflows/python-package-conda.yml renamed to ‎.github/workflows/tests-conda.yml
Lines changed: 11 additions & 2 deletions b/‎.github/workflows/python-package-conda.yml renamed to ‎.github/workflows/tests-conda.yml
Lines changed: 11 additions & 2 deletions
diff --git a/‎CONTRIBUTING.rst
Lines changed: 9 additions & 4 deletions b/‎CONTRIBUTING.rst
Lines changed: 9 additions & 4 deletions
diff --git a/‎README.md
Lines changed: 36 additions & 5 deletions b/‎README.md
Lines changed: 36 additions & 5 deletions
diff --git a/‎docs/source/api_reference_full.rst
Lines changed: 20 additions & 14 deletions b/‎docs/source/api_reference_full.rst
Lines changed: 20 additions & 14 deletions
diff --git a/‎docs/source/conf.py
Lines changed: 1 addition & 1 deletion b/‎docs/source/conf.py
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source/custom_pipelines.rst
Lines changed: 51 additions & 90 deletions b/‎docs/source/custom_pipelines.rst
Lines changed: 51 additions & 90 deletions
diff --git a/‎docs/source/examples.rst
Lines changed: 0 additions & 13 deletions b/‎docs/source/examples.rst
Lines changed: 0 additions & 13 deletions
diff --git a/‎docs/source/examples/link_advanced_HE_chunks.nblink
Lines changed: 0 additions & 3 deletions b/‎docs/source/examples/link_advanced_HE_chunks.nblink
Lines changed: 0 additions & 3 deletions
diff --git a/‎docs/source/examples/link_basic_HE.nblink
Lines changed: 0 additions & 3 deletions b/‎docs/source/examples/link_basic_HE.nblink
Lines changed: 0 additions & 3 deletions
@@ -2,9 +2,13 @@
 
 [run]
 source = pathml
-omit =
-    pathml/preprocessing/base_*
 command_line = -m pytest
 
 [html]
 directory = coverage_report_html
+
+[report]
+exclude_lines =
+    pragma: no cover
+    if self.debug:
+    raise NotImplementedError
@@ -1,14 +1,16 @@
 name: Python Package using Conda
 
-on: [pull_request]
+on: 
+  pull_request:
+    branches: [dev, master]
 
 jobs:
   build-linux:
     runs-on: ubuntu-latest
     strategy:
       max-parallel: 5
       matrix:
-        python-version: [3.7]
+        python-version: [3.8]
 
     steps:
     - uses: actions/checkout@v2
@@ -45,3 +47,10 @@ jobs:
       run: |
         conda install pytest
         python -m pytest
+    - name: Compile docs
+      shell: bash -l {0}
+      run: |
+        sudo apt-get install pandoc
+        pip install ipython sphinx nbsphinx nbsphinx-link sphinx-rtd-theme
+        cd docs
+        make html
@@ -47,14 +47,16 @@ Here's how to contribute code, documentation, etc.
 6. Write new tests as needed to maintain code coverage
 7. Ensure that all tests still pass
 8. Commit your changes and submit a pull request reference the corresponding issue
+9. Respond to discussion/feedback about the pull request. Make changes as necessary.
 
 Documentation Standards
 =======================
 
 All code should be documented, including docstrings for users AND inline comments for
 other developers whenever possible! Both are crucial for ensuring long-term usability and maintainability.
-Documentation is automatically generated using the Sphinx `autodoc`_ extension from properly formatted docstrings.
-All documentation (including docstrings) are written in `reStructuredText`_ format.
+Documentation is automatically generated using the Sphinx `autodoc`_ and `napoleon`_ extensions from
+properly formatted Google-style docstrings.
+All documentation (including docstrings) is written in `reStructuredText`_ format.
 See this `docstring example`_ to get started.
 
 To build documentation:
@@ -65,10 +67,12 @@ To build documentation:
     cd docs                 # enter docs directory
     make html               # build docs in html format
 
+Open ``docs/build/html/index.html`` in your favorite web browser.
+
 Testing Standards
 =================
 
-All new code should be accompanied by tests, whenever possible, to maintain good code coverage.
+All new code should be accompanied by tests, whenever possible, to maintain good code coverage (target >90%).
 We use the `pytest`_ testing framework.
 All tests should pass for new code, and new tests should be added as necessary when fixing bugs.
 
@@ -91,4 +95,5 @@ Thank you for helping make ``PathML`` better!
 .. _pytest: https://docs.pytest.org/en/stable/
 .. _autodoc: https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html
 .. _reStructuredText: https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html
-.. _docstring example: https://sphinx-rtd-tutorial.readthedocs.io/en/latest/docstrings.html
+.. _docstring example: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html
+.. _napoleon: https://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html
@@ -24,14 +24,45 @@ A toolkit for computational pathology and machine learning.
 
 ## Installation
 
+1. Clone repo
+
+````
+git clone https://github.com/Dana-Farber/pathml.git
+cd pathml
+````
+
+2. Set Up Conda Environment
+
+````
+conda create --name pathml
+conda activate pathml
+````
+
+3. Install CUDA. This step only applies if you want to use GPU acceleration for model training or other tasks. This guide should work, but for the most up-to-date instructions, refer to the [official PyTorch installation instructions](https://pytorch.org/get-started/locally/).
+
+    - Check the version of CUDA:
+    
+        ````
+        nvidia-smi
+        ````
+    
+    - Install correct version of `cudatoolkit`:
+
+        ````
+        # update this command with your CUDA version number
+        conda install cudatoolkit=11.0
+        ````
+
+
+4. Install PathML
+
 ````
-git clone https://github.com/Dana-Farber/pathml.git     # clone repo
-cd pathml                               # enter repo directory
-conda env create -f environment.yml     # create conda environment
-conda activate pathml                   # activate conda environment
-pip install -e .                        # install pathml in conda environment
+conda env update -f environment.yml     # install dependencies
+pip install -e .                        # install pathml
 ````
 
+>> to verify PyTorch installation with GPU support: `python -c "import torch; print(torch.cuda.is_available())"`
+
 ## Generate Documentation
 
 This repo is not yet open to the public. Once we open source it, we will host documentation online.
 
@@ -1,30 +1,34 @@
 Full API Reference
 ==================
 
-Preprocessing
--------------
+Core
+----
 
-.. automodule:: pathml.preprocessing.wsi
+.. automodule:: pathml.core.slide_data
     :members:
-.. automodule:: pathml.preprocessing.multiparametricslide
+.. automodule:: pathml.core.slide_classes
     :members:
-.. automodule:: pathml.preprocessing.slide_data
+.. automodule:: pathml.core.slide_backends
     :members:
-.. automodule:: pathml.preprocessing.pipeline
+.. automodule:: pathml.core.tile
     :members:
-.. automodule:: pathml.preprocessing.tiling
+.. automodule:: pathml.core.tiles
     :members:
-.. automodule:: pathml.preprocessing.stains
+.. automodule:: pathml.core.masks
     :members:
-.. automodule:: pathml.preprocessing.transforms
+.. automodule:: pathml.core.h5managers
     :members:
-.. automodule:: pathml.preprocessing.transforms_HandE
+
+Preprocessing
+-------------
+
+.. automodule:: pathml.preprocessing.pipeline
     :members:
-.. automodule:: pathml.preprocessing.utils
+.. automodule:: pathml.preprocessing.tiling
     :members:
-.. automodule:: pathml.preprocessing.base
+.. automodule:: pathml.preprocessing.transforms
     :members:
-
+    :undoc-members:
 
 Datasets
 --------
@@ -39,5 +43,7 @@ Datasets
 ML
 --
 
-.. automodule:: pathml.ml
+.. automodule:: pathml.ml.hovernet
+    :members:
+.. automodule:: pathml.ml.utils
     :members:
@@ -18,7 +18,7 @@
 # -- Project information -----------------------------------------------------
 
 project = 'PathML'
-copyright = '2020, DFCI'
+copyright = '2021, Dana-Farber Cancer Institute'
 author = 'Jacob Rosenthal'
 
 version = '0.0.1'
 
@@ -1,113 +1,74 @@
 Custom Preprocessing Pipelines
 ==============================
 
-``PathML`` comes with a set of pre-made pipelines ready to use out of the box.
-However, it may also be necessary in many cases to create custom preprocessing pipelines tailored to the specific
-application at hand.
+``PathML`` makes designing preprocessing pipelines easy. In this section we will walk through how to define a
+:class:`~pathml.preprocessing.pipeline.Pipeline` object by composing pre-made
+:class:`~pathml.preprocessing.transforms.Transform`s, and how to implement a
+new custom :class:`~pathml.preprocessing.transforms.Transform`.
 
 Pipeline basics
 ---------------
 
-Preprocessing pipelines are defined in objects that inherit from the BasePipeline abstract class.
-The preprocessing logic for a single slide is defined in the ``run_single()`` method.
-Then, when the ``run()`` method is called, the input is checked to see whether it is a single slide or a dataset of
-slides. The ``run_single()`` method is then called as appropriate, and multiprocessing is automatically handled in the
-case of processing an entire dataset.
+Preprocessing pipelines are defined in :class:`~pathml.preprocessing.pipeline.Pipeline` objects.
+When :meth:`~pathml.core.slide_data.SlideData.run`
+is called, tiles are lazily extracted from the slide by
+:meth:`~pathml.core.slide_data.SlideData.generate_tiles` and passed to the
+:class:`~pathml.preprocessing.pipeline.Pipeline`, which modifies the :class:`~pathml.core.tile.Tile` object in place.
+Finally, the processed tile is saved.
+This design facilitates preprocessing of gigapixel-scale whole-slide images, because :class:`~pathml.core.tile.Tile`
+objects are small enough to fit in memory.
 
-To define a new pipeline, all that is necessary is to define the ``run_single()`` method.
-The method should take a ``BaseSlide`` object as input (or a specific type of slide inheriting from the ``BaseSlide``
-class), and should write the processed output to disk. Because the ``run()`` method is just a wrapper around the
-``run_single()`` method, there is no need to override the default ``run()``.
+Composing a Pipeline
+--------------------
 
-A ``SlideData`` object can be used to hold intermediate outputs, so that a preprocessing step can have access to
-outputs from earlier steps.
+In many cases, a preprocessing pipeline can be thought of as a sequence of transformations.
+:class:`~pathml.preprocessing.pipeline.Pipeline` objects can be created by composing
+a list of :class:`~pathml.preprocessing.transforms.Transform`:
 
-Interacting with slides
-------------------------
+.. code-block:: python
 
-Pipelines must take ``BaseSlide`` objects as input.
-This interaction between Pipelines and Slides is very important - design choices here can affect pipeline execution
-times by orders of magnitude!
-This is because whole-slide images can be very large, even exceeding the amount of available memory in most machines!
+    pipeline = Pipeline([
+        BoxBlur(kernel_size=15),
+        TissueDetectionHE(mask_name = "tissue", min_region_size=500,
+                          threshold=30, outer_contours_only=True)
+    ])
+..
 
-.. note::
+In this example, the preprocessing pipeline will first apply a box blur kernel, and then apply tissue detection.
+It is that easy to compose pipelines by mixing and matching :class:`~pathml.preprocessing.transforms.Transform` objects!
 
-    Naively loading an entire WSI into memory at high-resolution should therefore be avoided in most cases!
 
-Consider these best-practices when designing custom pipelines:
+Custom Transforms
+-----------------
 
-- Make use of the ``BaseSlide.chunks()`` method to process the WSI in smaller chunks
-- Perform operations on lower-resolution image levels, when possible (i.e. when the slide has multiple resolutions
-  available and the operation will not suffer from decreased resolution)
-- Be cognizant of memory requirements at each step in the pipeline
-- Avoid loading entire slides into memory at high-resolution!
+A :class:`~pathml.preprocessing.pipeline.Pipeline` is a special case of
+a :class:`~pathml.preprocessing.transforms.Transform` which makes it easy
+to compose several :class:`~pathml.preprocessing.transforms.Transform`s sequentially.
+However, in some cases, you may want to implement a :class:`~pathml.preprocessing.transforms.Transform` from scratch.
+For example, you may want to apply a transformation which is not already implemented in ``PathML``.
+Or, perhaps you want to apply a preprocessing pipeline which is not perfectly sequential.
 
-Using Transforms
--------------------
+To define a new custom :class:`~pathml.preprocessing.transforms.Transform`,
+all you need to do is create a class which inherits from :class:`~pathml.preprocessing.transforms.Transform` and
+implements an ``apply()`` method which takes a :class:`~pathml.core.tile.Tile` as an argument and modifies it in place.
+You may also implement a functional method ``F()``, although that is not strictly required.
 
-``PathML`` provides a set of modular Transformation objects to make it easier to define custom preprocessing pipelines.
-Individual low-level operations are implemented in ``Transform`` objects, through the ``apply()`` method.
-This consistent API makes it convenient to use complex operations in pipelines, and combine them modularly.
-There are several types of Transforms, as defined by their inputs and outputs:
+For example, let's take a look at how :class:`~pathml.preprocessing.transforms.BoxBlur` is implemented:
 
-================== ========== ===========
-Transform type     Input      Output
-================== ========== ===========
-ImageTransform     image      image
-Segmentation       image      mask
-MaskTransform      mask       mask
-================== ========== ===========
+.. code-block:: python
 
-Some things to consider when implementing a custom pipeline:
-
-- Use existing Transforms when possible! This will save time compared to implementing the entire pipeline from scratch.
-- If implementing a new transformation or pipeline operation, consider contributing it to ``PathML`` so that other
-  users in the community can benefit from your hard work! See: contributing
-- Be aware of memory and computation requirements of your pipeline.
-
-
-Examples
---------
-
-In this example we'll define a Pipeline which reads chunks of the input slide, applies a box blur with a given kernel
-size, and then writes the blurred image to disk.
-
-.. code-block::
-
-    import os
-    import cv2
-    from pathml.preprocessing.base import BasePipeline
-    from pathml.preprocessing.transforms import BoxBlur
-    from pathml.preprocessing.wsi import HESlide
-
-    class ExamplePipeline(BasePipeline):
-        def __init__(self, kernel_size):
+    class BoxBlur(Transform):
+        """Box (average) blur kernel."""
+        def __init__(self, kernel_size=5):
             self.kernel_size = kernel_size
 
-        def run_single(self, slide, output_dir):
-            blur = BoxBlur(kernel_size)
-            for i, chunk in enumerate(slide.chunks(level = 0, size = 1000)):
-                blurred_chunk = blur.apply(chunk)
-                fname = os.path.join(output_dir, f"chunk{i}.jpg")
-                cv2.imwrite(fname, blurred_chunk)
-
-    # usage
-    wsi = HESlide("/path/to/wsi.svs")
-    ExamplePipeline(kernel_size = 11).run(wsi)
-
-
-In this example, we define a Transform which changes the order of the channels in the input RGB image.
-
-.. code-block::
+        def F(self, image):
+            return cv2.boxFilter(image, ksize = (self.kernel_size, self.kernel_size), ddepth = -1)
 
-    from pathml.preprocessing.base import ImageTransform
+        def apply(self, tile):
+            tile.image = self.F(tile.image)
+..
 
-    class ChannelSwitch(ImageTransform):
-        def apply(self, image):
-            # make sure that the input image has 3 channels
-            assert image.shape[2] == 3
-            out = image
-            out[:, :, 0] = image[:, :, 2]
-            out[:, :, 1] = image[:, :, 0]
-            out[:, :, 2] = image[:, :, 1]
-            return out
+That's it! Once you define your custom :class:`~pathml.preprocessing.transforms.Transform`,
+you can plug it in with any of the other :class:`~pathml.preprocessing.transforms.Transform`s,
+compose :class:`~pathml.preprocessing.pipeline.Pipeline`, etc.