This is a image data repository. It complements the
repositories. The augmentation repository augments the images of this repository, whilst the derma repository is a repository of models that use the augmentations.
The data is courtesy of the International Skin Imaging Collaboration (ISIC). It is a set of dermoscopic images of skin lesions: specifically, the images of the ISIC 2019 Challenge, i.e.,
file | description | size |
---|---|---|
ISIC_2019_Training_Input.zip | 25,331 JPEG images of skin lesions | ~9GB |
ISIC_2019_Training_Metadata.csv | 25,331 metadata entries of age, sex, general anatomic site, and common lesion identifier | 1.15MB |
ISIC_2019_Training_GroundTruth.csv | 25,331 entries of gold standard lesion diagnoses | 1.23MB |
The images are either the same as those hosted by the ISIC Archive API or down-sampled versions. The data set outlined below might be used if the ground truths are released.
- ISIC_2019_Test_Input.zip: 8,238 JPEG images of skin lesions
- ISIC_2019_Test_Metadata.csv: 8,238 metadata entries of age, sex, and general anatomic site
To ensure availability, the contents of ISIC_2019_Training_Input.zip are in the directory data/images, whilst copies of the ISIC_2019_Training_Metadata.csv & ISIC_2019_Training_GroundTruth.csv files are stored in data.
Augmented versions of the images in ISIC_2019_Training_Input.zip are created via the augmentation package. The package
- ensures that all images are of the same size; the size is determined by the models
- creates rotated forms of most images
The augmentations are stored in augmentations/images. The images are zipped, and heir metadata is summarised in augmentations/inventory.csv
Details: https://challenge2019.isic-archive.com/data.html
The images and metadata of the "ISIC 2019: Training" data used herein are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC-BY-NC). The copyright holders are:
- BCN_20000 Dataset: © Department of Dermatology, Hospital Clínic de Barcelona, https://arxiv.org/abs/1908.02288 4
- HAM10000 Dataset: © ViDIR Group, Department of Dermatology, Medical University of Vienna, https://www.nature.com/articles/sdata2018161 1
- MSK Dataset: © Anonymous; https://arxiv.org/abs/1710.05006, https://arxiv.org/abs/1902.03368 2, 3
References
- P. Tschandl, C. Rosendahl, H. Kittler: The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions, Scietific Data, Volume 5, Article Number: 180161, 2018, doi:10.1038/sdata.2018.161
- Noel C. F. Codella, David Gutman, M. Emre Celebi, Brian Helba, Michael A. Marchetti, Stephen W. Dusza, Aadi Kalloo, Konstantinos Liopyris, Nabin Mishra, Harald Kittler, Allan Halpern: Skin Lesion Analysis Toward Melanoma Detection: A Challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), Hosted by the International Skin Imaging Collaboration (ISIC), 2018, arXiv:1710.05006
- Noel Codella, Veronica Rotemberg, Philipp Tschandl, M. Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael A. Marchetti, Harald Kittler, Allan Halpern: Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC), 2019, arXiv:1902.03368
- Marc Combalia, Noel C. F. Codella, Veronica Rotemberg, Brian Helba, Veronica Vilaplana, Ofer Reiter, Cristina Carrera, Alicia Barreiro, Allan C. Halpern, Susana Puig, Josep Malvehy: BCN20000: Dermoscopic Lesions in the Wild, 2019, arXiv:1908.02288