Author: Brendan Crabb brendancrabb@gmail.com
Created August 1, 2017
Welcome to SlideSeg, a python module that allows you to segment whole slide images into usable image chips for deep learning. Image masks for each chip are generated from associated markup and annotation files.
If you use this code for research purposes, please cite the following in your paper:
Brendan Crabb, Niels Olson, "SlideSeg: a Python module for the creation of annotated image repositories from whole slide images", Proc. SPIE 10581, Medical Imaging 2018: Digital Pathology, 105811C (6 March 2018); doi: 10.1117/12.2300262; https://doi.org/10.1117/12.2300262
For a version of SlideSeg compatible with python 3, please see https://github.com/abcsFrederick/SlideSeg3. Moreover, this version also support multiprocessing, drastically decreasing required processing times.
- Dependencies
- Anaconda Environment
2.1 Creating Environment from .yml File
2.2 Installing C Libraries (Windows)
2.3 Installing C Libraries (Mac OS X)
2.4 Launching Jupyter Notebook
2.5 Change Jupyter Notebook startup folder (Windows)
2.6 Change Jupyter Notebook startup folder (OS X)
2.7 Jupyter Kernel Selection - Setup
3.1 Supported Formats
3.2 Parameters
3.3 Annotation Key - Output
5.1 Image_Chips
5.2 Image_Masks
5.3 Text Files - Run
SlideSeg runs on Python 2.7 and depends on the following libraries:
- openslide 1.1.1
- tqdm 4.15.0
- cv2 3.2.0
- numpy
- pexif 0.15
The libraries can be installed using:
pip install slideseg
If pip isn't installed, you may have to enter the following before installing slideseg (OS X):
sudo easy_install pip
If you are using the preconfigured SlideSeg anaconda environment, these dependencies will already be installed. SlideSeg also depends on several C libraries; see section 2.2 (windows) and section 2.3 (Mac OS X) for installation instructions.
Make sure anaconda is installed. The SlideSeg environment has an Ipython kernel with all of the necessary packages already installed; however, conda support for jupyter notebooks is needed to switch kernels. This support is available through conda itself and can be enabled by issuing the following command:
conda install nb_conda
Copy the environment_slideseg.yml file to the anaconda directory, .../anaconda/scripts/. In the same directory, issue the following command to create the anaconda environment from the file:
conda env create -f environment_slideseg.yml
Creating the environment might take a few minutes. Once finished, issue the following command to activate the environment:
- Windows:
activate SlideSeg
- macOS and Linux:
source activate SlideSeg
If the environment was activated successfully, you should see (SlideSeg) at the beggining of the command prompt. This will set the SlideSeg kernel as your default kernel when running jupyter.
OpenSlide and OpenCV are C libraries; as a result, they have to be installed separately from the conda environment, which contains all of the python dependencies.
The Windows Binaries for OpenSlide can be found at 'openslide.org/download/'. Download the appropriate binaries for your system (either 32-bit or 64-bit) and unzip the file.
Copy the .dll files in ../bin/ to .../Anaconda/envs/SlideSeg/Library/bin/.
Copy the .h files to .../Anaconda/envs/SlideSeg/include/.
Finally, copy the .lib file to .../Anaconda/envs/SlideSeg/libs/.
OpenSlide has now been installed.
Use the following tutorial to download OpenCV, either from prebuilt binaries or from source:
http://docs.opencv.org/3.2.0/d5/de5/tutorial_py_setup_in_windows.html
OpenSlide and OpenCV are C libraries; as a result, they have to be installed separately from the conda environment, which contains all of the python dependencies.
If you are using Homebrew, enter the following in the terminal:
brew install opencv
brew install openslide
OpenSlide and OpenCV should now be installed in your anaconda environment.
The Jupyter Notebook App can be launched by clicking on the Jupyter Notebook icon installed by Anaconda in the start menu (Windows) or by typing in the terminal (cmd on Windows):
jupyter notebook
This will launch a new browser window showing the Notebook Dashboard. When started, the Jupyter Notebook app can only access files within its start-up folder. If you stored the SlideSeg notebook documents in a subfolder of your user folder, no configuration is necessary. Otherwise, you need to change your Jupyter Notebook App start-up folder.
- Copy the Jupyter Notebook launcher from the menu to the desktop.
- Right click on the new launcher, select properties, and change the Target field, change %USERPROFILE% to the full path of the folder which will contain all the notebooks.
- Double-click on the Jupyter Notebook desktop launcher (icon shows [IPy]) to start the Jupyter Notebook App, which will open in a new browser window (or tab). Note also that a secondary terminal window (used only for error logging and for shut down) will be also opened. If only the terminal starts, try opening this address with your browser: http://localhost:8888/.
To launch Jupyter Notebook App:
- Click on spotlight, type terminal to open a terminal window.
- Enter the startup folder by typing cd /some_folder_name.
- Type jupyter notebook to launch the Jupyter Notebook App (it will appear in a new browser window or tab).
After launching the Jupyter Notebook App, navigate to the SlideSeg notebook and click on its name to open in a new browser tab. In the upper right corner, you should see Python [conda env:SlideSeg]. If not, click on Kernel> Change Kernel> and change your current kernel to Python [conda env:SlideSeg].
Create a folder called 'images/' in the main directory and copy all of the slide images into this folder. Copy the markup and annotation files (in .xml format) into the xml folder in the main project directory. It is important that the annotation files have the same file name as the slide they are associated with.
SlideSeg can read virtual slides in the following formats:
- Aperio (.svs, .tif)
- Hamamatsu (.ndpi, .vms, .vmu)
- Leica (.scn)
- MIRAX (.mrxs)
- Philips (.tiff)
- Sakura (.svslide)
- Trestle (.tif)
- Ventana (.bif, .tif)
- Generic tile TIFF (.tif)
SlideSeg can read annotations in the following formats:
- XML (.xml)
SlideSeg depends on the following parameters:
slide_path: Path to the folder of slide images
xml_path: Path to the folder of xml files
output_dir: Path to the output folder where image_chips, image_masks, and text_files will be saved
format: Output format of the image_chips and image_masks (png or jpg only)
quality: Output quality: JPEG compression if output format is 'jpg' (100 recommended,jpg compression artifacts will distort image segmentation)
size: Size of image_chips and image_masks in pixels
overlap: Pixel overlap between image chips
key: The text file containing annotation keys and color codes
save_all: True saves every image_chip, False only saves chips containing an annotated pixel
save_ratio: Ratio of image_chips containing annotations to image_chips not containing annotations (use 'inf' if only annotated chips are desired; only applicable if save_all == False
The main directory should already contain an Annotation_Key.txt file. If no Annotation_Key file is present, one will be generated automatically from the annotation files in the xml folder.
The Annotation_Key file contains every annotation key with its associated color code. In all image masks, annotations with that key will have the specified pixel value. If an unknown key is encountered, it will be given a pixel value and added to the Annotation_Key automatically.
The following functions are defined within the slideseg module and used to generate, edit, and read the annotation key:
<code>def loadkeys(annotation_key):
"""
Opens annotation_key file and loads keys and color codes
:param: annotation_key: the filename of the annotation key
:return: color codes
"""
def addkeys(annotation_key, key):
"""
Adds new key and color_code to annotation key
:param annotation_key: the filename of the annotation key
:param key: The annotation to be added
:return: updated annotation key file
"""
def writeannotations(annotation_key, annotations):
"""
Writes annotation keys and color codes to annotation key text file
:param annotation_key: filename of annotation key
:param annotations: Dictionary of annotation keys and color codes
:return: .txt file with annotation keys
"""
def generatekey(annotation_key, path):
"""
Generates annotation_key from folder of xml files
:param annotation_key: the name of the annotation key file
:param path: Directory containing xml files
:return: annotation_key file
"""
Every generated image chip will be saved in the output/image_chips folder. The chips are saved with the naming convention of slide filename_level number_row_column.format. If the chip contains an area that was annotated and the tags are enabled, it will have an associated tag (under the Subject category) with the annotation key. If the image chip does not contain annotations, the 'NONE' tag will be added. To view these tags, switch to details view and click display 'Subject' in the explorer. The files can be sorted according to their tags. Unfortunately, these tags will only be available if the output format is .jpg.
The following functions are defined in the slideseg module and are used to save both the image chips and image masks, as well as attaching exif metadata to the images:
def ensuredirectory(dest):
"""
Ensures the existence of a directory
:param dest: Directory to ensure.
:return: new directory if it did not previously exist.
"""
def attachtags(path, keys):
"""
Attaches image tags to metadata of chips and masks
:param path: file to attach tags to.
:param keys: keys to attach as tags
:return: JPG with metadata tags
"""
def savechip(chip, path, quality, keys):
"""
Saves the image chip
:param chip: the slide image chip to save
:param path: the full path to the chip
:param quality: the output quality
:param keys: keys associated with the chip
:return:
"""
def savemask(mask, path, keys):
"""
Saves the image masks
:param mask: the image mask to save
:param path: the complete path for the mask
:param keys: keys associated with the chip
:return:
"""
def checksave(save_all, pix_list, save_ratio, save_count_annotated, save_count_blank):
"""
Checks whether or not an image chip should be saved
:param save_all: (bool) saves all chips if true
:param pix_list: list of pixel values in image mask
:param save_ratio: ratio of annotated chips to unannotated chips
:param save_count_annotated: total annotated chips saved
:param save_count_blank: total blank chips saved
:return: bool
"""
def formatcheck(format):
"""
Assures correct format parameter was defined correctly
:param format: the output format parameter
:return: format
:return: suffix
"""
The following functions are defined in the slideseg module and are used to save both the image chips and image masks, as well as attaching exif metadata to the images:
def ensuredirectory(dest):
"""
Ensures the existence of a directory
:param dest: Directory to ensure.
:return: new directory if it did not previously exist.
"""
def attachtags(path, keys):
"""
Attaches image tags to metadata of chips and masks
:param path: file to attach tags to.
:param keys: keys to attach as tags
:return: JPG with metadata tags
"""
def savechip(chip, path, quality, keys):
"""
Saves the image chip
:param chip: the slide image chip to save
:param path: the full path to the chip
:param quality: the output quality
:param keys: keys associated with the chip
:return:
"""
def savemask(mask, path, keys):
"""
Saves the image masks
:param mask: the image mask to save
:param path: the complete path for the mask
:param keys: keys associated with the chip
:return:
"""
def checksave(save_all, pix_list, save_ratio, save_count_annotated, save_count_blank):
"""
Checks whether or not an image chip should be saved
:param save_all: (bool) saves all chips if true
:param pix_list: list of pixel values in image mask
:param save_ratio: ratio of annotated chips to unannotated chips
:param save_count_annotated: total annotated chips saved
:param save_count_blank: total blank chips saved
:return: bool
"""
def formatcheck(format):
"""
Assures correct format parameter was defined correctly
:param format: the output format parameter
:return: format
:return: suffix
"""
An image mask for each image chip is saved in the output/image_masks folder. The mask has the same name as the image chip it is associated with. Furthermore, these masks will have the same tags, allowing you to sort by annotation type.
The following function handles the generation of an annotation mask from xml files:
def makemask(annotation_key, size, xml_path):
"""
Reads xml file and makes annotation mask for entire slide image
:param annotation_key: name of the annotation key file
:param size: size of the whole slide image
:param xml_path: path to the xml file
:return: annotation mask
:return: dictionary of annotation keys and color codes
"""
A text file with details about annotations and image chips will also be saved to output/textfiles. For each slide image, this text file will contain a list of all annotation keys present in the image. For each annotation key, a list of every image chip/mask containing that specific key is also recorded in this file.
The following functions generates these .txt files:
def writekeys(filename, annotations):
"""
Writes each annotation key to the output text file
:param filename: filename of image chip
:param annotations: dictionary of annotation keys
:return: updated text file
"""
def writeimagelist(filename, image_dictionary):
"""
Writes list of images containing each annotation key
:param filename: the name of the slide image
:param image_dictionary: dictionary of images with each key
:return text
"""
To execute SlideSeg, simply open the jupyter notebook and run the cells. Alternatively, you can run the python script 'main.py'. Make sure that you defined the Parameters. If the python script is used, the parameters are specified in the Parameters.txt file.