Skip to content

Use this to download all elements of the BCSS dataset described in: Amgad M, Elfandy H, ..., Gutman DA, Cooper LAD. Structured crowdsourcing enables convolutional segmentation of histology images. Bioinformatics. 2019. doi: 10.1093/bioinformatics/btz083

Notifications You must be signed in to change notification settings

xiaochengcike/BCSS

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Breast Cancer Semantic Segmentation (BCSS) dataset

This repo contains the necessary information and download instructions to download the dataset associated with the paper:

Amgad M, Elfandy H, ..., Gutman DA, Cooper LAD. Structured crowdsourcing enables convolutional segmentation of histology images. Bioinformatics. 2019. doi: 10.1093/bioinformatics/btz083

This data can be visualized in a public instance of the DSA at this link. If you click the “eye” image icon in the Annotations panel on the right side of the screen, you’ll see the results of a collaborative annotation.


Usage

  • Each mask is a .png image, where pixel values encode region class membership. The meaning of ground truth encoded can be found at the file ./meta/gtruth_codes.tsv.

  • The name of each mask encodes all necessary information to extract the corresponding RGB images from TCGA slides. For convenience, RGBs are also downloaded using the code used here.

  • [CRITICAL] - Please be aware that zero pixels represent regions outside the region of interest (“don’t care” class) and should be assigned zero-weight during model training; they do NOT represent an “other” class.

  • The RGBs and corresponding masks will be at the set MPP resolution. If MPP was set to None, then they would be at MAG magnification. If both are set to None, then they will be at the base (scan) magnification.


Download (single link - convenient)

You can use this link to download the dataset at 0.25 MPP resolution.


Download (command line - flexible)

Use this to download all elements of the dataset using the command line.

This script will download any or all of the following:

  • annotation JSON files (coordinates relative to WSI base resolution)
  • masks
  • RGB images

Steps are as follows:

Step 0: Clone this repo

$ git clone https://github.com/CancerDataScience/CrowdsourcingDataset-Amgadetal2019
$ cd CrowdsourcingDataset-Amgadetal2019

Step 1: Instal requirements

pip install girder_client girder-client pillow numpy scikit-image imageio

Step 2 (optional): Edit configs.py

If you like, you may edit various download configurations. Of note:

  • SAVEPATH - where everything will be saved
  • MPP - microns per pixel for RGBs and masks (preferred, default is 0.25)
  • MAG - magnification (overridden by MPP if MPP is set. default is None)
  • PIPELINE - what elements to download?

Step 3: Run the download script

python download_crowdsource_dataset.py

The script will create the following sub-directories in SAVEPATH:

|_ annotations : where JSON annotations will be saves for each slide
|_ masks : where the ground truth masks to use for training and validation are saved
|_ images: where RGB images corresponding to masks are saved
|_ wsis (legacy) : Ignore this. No longer supported.
|_ logs : in case anythign goes wrong

Licensing

This dataset is licensed under a CC BY 4.0 license. Please cite our paper if you use the data.

About

Use this to download all elements of the BCSS dataset described in: Amgad M, Elfandy H, ..., Gutman DA, Cooper LAD. Structured crowdsourcing enables convolutional segmentation of histology images. Bioinformatics. 2019. doi: 10.1093/bioinformatics/btz083

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%