Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Project] Medical semantic seg dataset: Kvasir seg #2677

Merged
merged 17 commits into from
Jun 25, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
updated scripts for project of medical dataset kvasir_seg (modality=e…
…ndoscopy).
  • Loading branch information
Masaaki-75 committed Mar 28, 2023
commit ae68e3a68c569948977a697d27ca7e09815ec54a
43 changes: 27 additions & 16 deletions projects/medical/2d_image/endoscopy/kvasir_seg/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Description

This project support **`Kvasir-Sessile Dataset (Kvasir SEG) `**, and the dataset used in this project can be downloaded from [here](https://opendatalab.com/Kvasir-Sessile_dataset).
This project supports **`Kvasir-Sessile Dataset (Kvasir SEG) `**, which can be downloaded from [here](https://opendatalab.com/Kvasir-Sessile_dataset).

## Dataset Overview

Expand All @@ -23,32 +23,45 @@ The Kvasir-SEG dataset contains polyp images and their corresponding ground trut

Note:

- `pct` means percentage of pixels in this category in all pixels.
- `Pct` means percentage of pixels in this category in all pixels.

### Visualization

![kvasir-seg](https://raw.githubusercontent.com/uni-medical/medical-datasets-visualization/main/2d/semantic_seg/endoscopy_images/kvasir_seg/kvasir_seg_dataset.png?raw=true)

### Dataset Citation

```
@inproceedings{jha2020kvasir,
title={Kvasir-seg: A segmented polyp dataset},
author={Jha, Debesh and Smedsrud, Pia H and Riegler, Michael A and Halvorsen, P{\aa}l and Lange, Thomas de and Johansen, Dag and Johansen, H{\aa}vard D},
booktitle={International Conference on Multimedia Modeling},
pages={451--462},
year={2020},
organization={Springer}
}
```

### Prerequisites

- Python 3.8
- PyTorch 1.10.0
- pillow(PIL)
- scikit-learn(sklearn)
- Python v3.8
- PyTorch v1.10.0
- pillow(PIL) v9.3.0
- scikit-learn(sklearn) v1.2.0
- [MIM](https://github.com/open-mmlab/mim) v0.3.4
- [MMCV](https://github.com/open-mmlab/mmcv) v2.0.0rc4
- [MMEngine](https://github.com/open-mmlab/mmengine) v0.2.0 or higher
- [MMSegmentation](https://github.com/open-mmlab/mmsegmentation) v1.0.0rc5

All the commands below rely on the correct configuration of PYTHONPATH, which should point to the project's directory so that Python can locate the module files. In kvasir_seg/ root directory, run the following line to add the current directory to PYTHONPATH:
All the commands below rely on the correct configuration of `PYTHONPATH`, which should point to the project's directory so that Python can locate the module files. In `kvasir_seg/` root directory, run the following line to add the current directory to `PYTHONPATH`:

```shell
export PYTHONPATH=`pwd`:$PYTHONPATH
```

### Dataset preparing
### Dataset Preparing

- download dataset from [here](https://opendatalab.com/Kvasir-Sessile_dataset) and decompression data to path 'data/'.
- download dataset from [here](https://opendatalab.com/Kvasir-Sessile_dataset) and decompress data to path `'data/'`.
- run script `"python tools/prepare_dataset.py"` to format data and change folder structure as below.
- run script `"python ../../tools/split_seg_dataset.py"` to split dataset and generate `train.txt`, `val.txt` and `test.txt`. If the label of official validation set and test set cannot be obtained, we generate `train.txt` and `val.txt` from the training set randomly.

Expand Down Expand Up @@ -89,20 +102,18 @@ export PYTHONPATH=`pwd`:$PYTHONPATH

### Training commands

```shell
mim train mmseg .configs/${CONFIG_PATH}
```

To train on multiple GPUs, e.g. 8 GPUs, run the following command:
To train models on a single server with one GPU. (default)

```shell
mim train mmseg ./configs/${CONFIG_PATH} --launcher pytorch --gpus 8
mim train mmseg .configs/${CONFIG_FILE}
```

### Testing commands

To test models on a single server with one GPU. (default)

```shell
mim test mmseg ./configs/${CONFIG_PATH} --checkpoint ${CHECKPOINT_PATH}
mim test mmseg ./configs/${CONFIG_FILE} --checkpoint ${CHECKPOINT_PATH}
```

<!-- List the results as usually done in other model's README. [Example](https://github.com/open-mmlab/mmsegmentation/tree/dev-1.x/configs/fcn#results-and-models)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,36 +1,84 @@
import glob
import os
import shutil

import numpy as np
from PIL import Image
from sklearn.model_selection import train_test_split

root_path = 'data/kvasir-seg/'
img_suffix = '.png'
seg_map_suffix = '.png'
root_path = 'data/'
img_suffix = '.jpg'
seg_map_suffix = '.jpg'
save_img_suffix = '.png'
save_seg_map_suffix = '.png'
tgt_img_dir = os.path.join(root_path, 'images/train/')
tgt_mask_dir = os.path.join(root_path, 'masks/train/')
os.system('mkdir -p ' + tgt_img_dir)
os.system('mkdir -p ' + tgt_mask_dir)

all_imgs = glob.glob('data/kvasir-seg/images/*' + img_suffix)
x_train, x_test = train_test_split(all_imgs, test_size=0.2, random_state=0)

print(len(x_train), len(x_test))
os.system('mkdir -p ' + root_path + 'images/train/')
os.system('mkdir -p ' + root_path + 'images/val/')
os.system('mkdir -p ' + root_path + 'masks/train/')
os.system('mkdir -p ' + root_path + 'masks/val/')

part_dir_dict = {0: 'train/', 1: 'val/'}
for ith, part in enumerate([x_train, x_test]):
part_dir = part_dir_dict[ith]
for img in part:
basename = os.path.basename(img)
img_save_path = os.path.join(root_path, 'images', part_dir,
basename.split('.')[0] + save_img_suffix)
shutil.copy(img, img_save_path)
mask_path = 'data/kvasir-seg/masks/' + basename
mask = Image.open(mask_path).convert('L')
mask_save_path = os.path.join(
root_path, 'masks', part_dir,
basename.split('.')[0] + save_seg_map_suffix)
mask.save(mask_save_path)

def filter_suffix_recursive(src_dir, suffix):
# filter out file names and paths in source directory
suffix = '.' + suffix if '.' not in suffix else suffix
file_paths = glob.glob(
os.path.join(src_dir, '**', '*' + suffix), recursive=True)
file_names = [_.split('/')[-1] for _ in file_paths]
return sorted(file_paths), sorted(file_names)


def convert_label(img, convert_dict):
arr = np.zeros_like(img, dtype=np.uint8)
for c, i in convert_dict.items():
arr[img == c] = i
return arr


def convert_pics_into_pngs(src_dir, tgt_dir, suffix, convert='RGB'):
if not os.path.exists(tgt_dir):
os.makedirs(tgt_dir)

src_paths, src_names = filter_suffix_recursive(src_dir, suffix=suffix)
for i, (src_name, src_path) in enumerate(zip(src_names, src_paths)):
tgt_name = src_name.replace(suffix, save_img_suffix)
tgt_path = os.path.join(tgt_dir, tgt_name)
num = len(src_paths)
img = np.array(Image.open(src_path))
if len(img.shape) == 2:
pil = Image.fromarray(img).convert(convert)
elif len(img.shape) == 3:
pil = Image.fromarray(img)
else:
raise ValueError('Input image not 2D/3D: ', img.shape)

pil.save(tgt_path)
print(f'processed {i+1}/{num}.')


def convert_label_pics_into_pngs(src_dir,
tgt_dir,
suffix,
convert_dict={
0: 0,
255: 1
}):
if not os.path.exists(tgt_dir):
os.makedirs(tgt_dir)

src_paths, src_names = filter_suffix_recursive(src_dir, suffix=suffix)
num = len(src_paths)
for i, (src_name, src_path) in enumerate(zip(src_names, src_paths)):
tgt_name = src_name.replace(suffix, save_seg_map_suffix)
tgt_path = os.path.join(tgt_dir, tgt_name)

img = np.array(Image.open(src_path))
img = convert_label(img, convert_dict)
Image.fromarray(img).save(tgt_path)
print(f'processed {i+1}/{num}.')

convert_pics_into_pngs(
os.path.join(root_path, 'Kvasir-SEG/kvasir-sessile/images'),
tgt_img_dir,
suffix=img_suffix)

convert_label_pics_into_pngs(
os.path.join(root_path, 'Kvasir-SEG/kvasir-sessile/masks'),
tgt_mask_dir,
suffix=seg_map_suffix)