Skip to content

Commit

Permalink
added hkh dataset datamodule and download script
Browse files Browse the repository at this point in the history
  • Loading branch information
isaaccorley committed Aug 29, 2021
1 parent 4c5b9fb commit e4b14f7
Show file tree
Hide file tree
Showing 9 changed files with 173 additions and 42 deletions.
110 changes: 74 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,15 @@ pip install 'git+https://github.com/isaaccorley/torchrs.git#egg=torch-rs[train]'

* [PROBA-V Multi-Image Super Resolution](https://github.com/isaaccorley/torchrs#proba-v-super-resolution)
* [ETCI 2021 Flood Detection](https://github.com/isaaccorley/torchrs#etci-2021-flood-detection)
* [HKH Glacier Mapping](https://github.com/isaaccorley/torchrs#hkh-glacier-mapping)
* [ZueriCrop - Time-Series Instance Segmentation](https://github.com/isaaccorley/torchrs#zuericrop)
* [FAIR1M - Fine-grained Object Recognition](https://github.com/isaaccorley/torchrs#fair1m---fine-grained-object-recognition)
* [ADVANCE - Audiovisual Aerial Scene Recognition](https://github.com/isaaccorley/torchrs#advance---audiovisual-aerial-scene-recognition)
* [OSCD - Onera Satellite Change Detection](https://github.com/isaaccorley/torchrs#onera-satellite-change-detection-oscd)
* [S2Looking - Satellite Side-Looking Change Detection](https://github.com/isaaccorley/torchrs#satellite-side-looking-s2looking-change-detection)
* [LEVIR-CD+ - LEVIR Change Detection+](https://github.com/isaaccorley/torchrs#levir-change-detection-levir-cd)
* [HRSCD - High Resolution Semantic Change Detection](https://github.com/isaaccorley/torchrs#high-resolution-semantic-change-detection-hrscd)
* [S2MTCP - Sentinel-2 Multitemporal Cities Pairs Change Detection](https://github.com/isaaccorley/torchrs#sentinel-2-multitemporal-cities-pairs-s2mtcp)
* [S2MTCP - Sentinel-2 Multitemporal Cities Pairs](https://github.com/isaaccorley/torchrs#sentinel-2-multitemporal-cities-pairs-s2mtcp)
* [RSVQA LR - Remote Sensing Visual Question Answering Low Resolution](https://github.com/isaaccorley/torchrs#remote-sensing-visual-question-answering-rsvqa-low-resolution-lr)
* [RSVQAxBEN - Remote Sensing Visual Question Answering BigEarthNet](https://github.com/isaaccorley/torchrs#remote-sensing-visual-question-answering-bigearthnet-rsvqaxben)
* [RSICD - Remote Sensing Image Captioning Dataset](https://github.com/isaaccorley/torchrs#remote-sensing-image-captioning-dataset-rsicd)
Expand All @@ -53,7 +55,6 @@ pip install 'git+https://github.com/isaaccorley/torchrs.git#egg=torch-rs[train]'
* [Inria Aerial Image Labeling - Building Semantic Segmentation](https://github.com/isaaccorley/torchrs#inria-aerial-image-labeling)
* [Dubai - Semantic Segmentation](https://github.com/isaaccorley/torchrs#dubai-segmentation)
* [GID-15 - Semantic Segmentation](https://github.com/isaaccorley/torchrs#gid-15)
* [ZueriCrop - Time-Series Instance Segmentation](https://github.com/isaaccorley/torchrs#zuericrop)
* [TiSeLaC - Time-Series Land Cover Classification](https://github.com/isaaccorley/torchrs#tiselac)

### PROBA-V Super Resolution
Expand Down Expand Up @@ -121,6 +122,77 @@ x: dict(
"""
```

### HKH Glacier Mapping

<img src="./assets/hkh_glacier.png" width="400px"></img>

The [Hindu Kush Himalayas (HKH) Glacier Mapping](https://lila.science/datasets/hkh-glacier-mapping) dataset is a semantic segmentation dataset of 7,095 512x512 multispectral images taken by the [USGS LandSat 7 satellite](https://landsat.gsfc.nasa.gov/landsat-7). The dataset contains imagery from 2002-2008 of the HKH region (spanning 8 countries) along with separate masks of clean-iced and debris-covered glaciers. The imagery contains 15 bands which includes 10 LandSat 7 bands, 3 precomputed NVDI/NDSI/NDWI indices, and 2 digital elevation and slope maps from the [SRTM 90m DEM Digital Elevation Database](https://srtm.csi.cgiar.org/).

The dataset can be downloaded (18GB/109GB compressed/uncompressed) using `scripts/download_hkh_glacier.sh` and instantiated below:

```python
from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import HKHGlacierMapping

transform = Compose([ToTensor()])

dataset = HKHGlacierMapping(
root="path/to/dataset/",
transform=transform
)

x = dataset[0]
"""
x: dict(
x: (15, 512, 512)
clean_ice_mask: (1, 512, 512)
debris_covered_mask: (1, 256, 256)
)
"""

dataset.bands
"""
['LE7 B1 (blue)', 'LE7 B2 (green)', 'LE7 B3 (red)', 'LE7 B4 (near infrared)', 'LE7 B5 (shortwave infrared 1)',
'LE7 B6_VCID_1 (low-gain thermal infrared)', 'LE7 B6_VCID_2 (high-gain thermal infrared)',
'LE7 B7 (shortwave infrared 2)', 'LE7 B8 (panchromatic)', 'LE7 BQA (quality bitmask)', 'NDVI (vegetation index)',
'NDSI (snow index)', 'NDWI (water index)', 'SRTM 90 elevation', 'SRTM 90 slope']
"""
```

### ZueriCrop

<img src="./assets/zuericrop.png" width="650px"></img>

The [ZueriCrop](https://github.com/0zgur0/ms-convSTAR) dataset is a time-series instance segmentation dataset proposed in ["Crop mapping from image time series: deep learning with multi-scale label hierarchies", Turkoglu et al.](https://arxiv.org/abs/2102.08820) of 116k medium resolution (10m) 24x24 multispectral 9-band imagery of Zurich and Thurgau, Switzerland taken by the [ESA Sentinel-2 satellite](https://sentinel.esa.int/web/sentinel/missions/sentinel-2) and contains pixel level semantic and instance annotations for 48 fine-grained, hierarchical categories of crop types. Note that there is only a single ground truth semantic & instance mask per time-series.

The dataset can be downloaded (39GB) using `scripts/download_zuericrop.sh` and instantiated below:

```python
from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import ZueriCrop

transform = Compose([ToTensor()])

dataset = ZueriCrop(
root="path/to/dataset/",
transform=transform
)

x = dataset[0]
"""
x: dict(
x: (142, 9, 24, 24) (t, c, h, w)
mask: (1, 24, 24)
instance_mask: (1, 24, 24)
)
"""

[cls.label for cls in ds.classes]
"""
['Unknown', 'SummerBarley', 'WinterBarley', 'Oat', 'Wheat', 'Grain', ...]
"""
```

### FAIR1M - Fine-grained Object Recognition

<img src="./assets/fair1m.jpg" width="550px"></img>
Expand Down Expand Up @@ -747,40 +819,6 @@ dataset.classes
"""
```

### ZueriCrop

<img src="./assets/zuericrop.png" width="750px"></img>

The [ZueriCrop](https://github.com/0zgur0/ms-convSTAR) dataset is a time-series instance segmentation dataset proposed in ["Crop mapping from image time series: deep learning with multi-scale label hierarchies", Turkoglu et al.](https://arxiv.org/abs/2102.08820) of 116k medium resolution (10m) 24x24 multispectral 9-band imagery of Zurich and Thurgau, Switzerland taken by the [ESA Sentinel-2 satellite](https://sentinel.esa.int/web/sentinel/missions/sentinel-2) and contains pixel level semantic and instance annotations for 48 fine-grained, hierarchical categories of crop types. Note that there is only a single ground truth semantic & instance mask per time-series.

The dataset can be downloaded (39GB) using `scripts/download_zuericrop.sh` and instantiated below:

```python
from torchrs.transforms import Compose, ToTensor
from torchrs.datasets import ZueriCrop

transform = Compose([ToTensor()])

dataset = ZueriCrop(
root="path/to/dataset/",
transform=transform
)

x = dataset[0]
"""
x: dict(
x: (142, 9, 24, 24) (t, c, h, w)
mask: (1, 24, 24)
instance_mask: (1, 24, 24)
)
"""

[cls.label for cls in ds.classes]
"""
['Unknown', 'SummerBarley', 'WinterBarley', 'Oat', 'Wheat', 'Grain', ...]
"""
```

### TiSeLaC

<img src="./assets/tiselac.png" width="900px"></img>
Expand Down
Binary file added assets/hkh_glacier.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 5 additions & 0 deletions scripts/download_hkh_glacier.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pip install gdown
mkdir -p .data
gdown --id 1cvjfe_MZJI9HXwkRgoQbSCH55qQd6Esm
unzip hkh_glacier_mapping.zip -d .data/
rm hkh_glacier_mapping.zip
4 changes: 3 additions & 1 deletion torchrs/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,13 @@
from .zuericrop import ZueriCrop
from .aid import AID
from .dubai_segmentation import DubaiSegmentation
from .hkh_glacier import HKHGlacierMapping


__all__ = [
"PROBAV", "ETCI2021", "RSVQALR", "RSVQAxBEN", "EuroSATRGB", "EuroSATMS",
"RESISC45", "RSICD", "OSCD", "S2Looking", "LEVIRCDPlus", "FAIR1M",
"SydneyCaptions", "UCMCaptions", "S2MTCP", "ADVANCE", "SAT4", "SAT6",
"HRSCD", "InriaAIL", "Tiselac", "GID15", "ZueriCrop", "AID", "DubaiSegmentation"
"HRSCD", "InriaAIL", "Tiselac", "GID15", "ZueriCrop", "AID", "DubaiSegmentation",
"HKHGlacierMapping"
]
5 changes: 2 additions & 3 deletions torchrs/datasets/dubai_segmentation.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,8 @@


class DubaiSegmentation(torch.utils.data.Dataset):
""" Inria Aerial Image Labeling dataset from 'Can semantic labeling methods
generalize to any city? the inria aerial image labeling benchmark', Maggiori et al. (2017)
https://ieeexplore.ieee.org/document/8127684
""" Semantic segmentation dataset of Dubai imagery taken by MBRSC satellites
https://humansintheloop.org/resources/datasets/semantic-segmentation-dataset/
"""
classes = {
Expand Down
61 changes: 61 additions & 0 deletions torchrs/datasets/hkh_glacier.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
import os
from glob import glob
from typing import List, Dict

import torch
import numpy as np

from torchrs.transforms import Compose, ToTensor


class HKHGlacierMapping(torch.utils.data.Dataset):
""" Hindu Kush Himalayas (HKH) Glacier Mapping dataset
https://lila.science/datasets/hkh-glacier-mapping
'We also provide 14190 numpy patches. The numpy patches are all of size 512x512x15 and
corresponding 512x512x2 pixel-wise mask labels; the two channels in the pixel-wise masks
correspond to clean-iced and debris-covered glaciers. Patches' geolocation information,
time stamps, source Landsat IDs, and glacier density are available in a geojson metadata file.'
"""
bands = [
"LE7 B1 (blue)",
"LE7 B2 (green)",
"LE7 B3 (red)",
"LE7 B4 (near infrared)",
"LE7 B5 (shortwave infrared 1)",
"LE7 B6_VCID_1 (low-gain thermal infrared)",
"LE7 B6_VCID_2 (high-gain thermal infrared)",
"LE7 B7 (shortwave infrared 2)",
"LE7 B8 (panchromatic)",
"LE7 BQA (quality bitmask)",
"NDVI (vegetation index)",
"NDSI (snow index)",
"NDWI (water index)",
"SRTM 90 elevation",
"SRTM 90 slope"
]

def __init__(
self,
root: str = ".data/hkh_glacier_mapping",
transform: Compose = Compose([ToTensor()]),
):
self.transform = transform
self.images = self.load_images(root)

@staticmethod
def load_images(path: str) -> List[Dict]:
images = sorted(glob(os.path.join(path, "images", "*.npy")))
masks = sorted(glob(os.path.join(path, "masks", "*.npy")))
return [dict(image=image, mask=mask) for image, mask in zip(images, masks)]

def __len__(self) -> int:
return len(self.images)

def __getitem__(self, idx: int) -> Dict[str, torch.Tensor]:
image_path, target_path = self.images[idx]["image"], self.images[idx]["mask"]
x, y = np.load(image_path), np.load(target_path)
y0, y1 = y[..., 0], y[..., 1]
x, y0, y1 = self.transform([x, y0, y1])
return dict(x=x, clean_ice_mask=y0, debris_covered_mask=y1)
4 changes: 3 additions & 1 deletion torchrs/train/datamodules/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
from .zuericrop import ZueriCropDataModule
from .aid import AIDDataModule
from .dubai_segmentation import DubaiSegmentationDataModule
from .hkh_glacier import HKHGlacierMappingDataModule


__all__ = [
Expand All @@ -29,5 +30,6 @@
"RSICDDataModule", "OSCDDataModule", "S2LookingDataModule", "LEVIRCDPlusDataModule",
"FAIR1MDataModule", "SydneyCaptionsDataModule", "UCMCaptionsDataModule", "S2MTCPDataModule",
"ADVANCEDataModule", "SAT4DataModule", "SAT6DataModule", "HRSCDDataModule", "InriaAILDataModule",
"TiselacDataModule", "GID15DataModule", "ZueriCropDataModule", "AIDDataModule", "DubaiSegmentationDataModule"
"TiselacDataModule", "GID15DataModule", "ZueriCropDataModule", "AIDDataModule",
"DubaiSegmentationDataModule", "HKHGlacierMappingDataModule"
]
1 change: 0 additions & 1 deletion torchrs/train/datamodules/dubai_segmentation.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,3 @@ def setup(self, stage: Optional[str] = None):
self.train_dataset, self.val_dataset, self.test_dataset = dataset_split(
dataset, val_pct=self.val_split, test_pct=self.test_split
)

25 changes: 25 additions & 0 deletions torchrs/train/datamodules/hkh_glacier.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
from typing import Optional

from torchrs.transforms import Compose, ToTensor
from torchrs.datasets.utils import dataset_split
from torchrs.train.datamodules import BaseDataModule
from torchrs.datasets import HKHGlacierMapping


class HKHGlacierMappingDataModule(BaseDataModule):

def __init__(
self,
root: str = ".data/hkh_glacier_mapping",
transform: Compose = Compose([ToTensor()]),
*args, **kwargs
):
super().__init__(*args, **kwargs)
self.root = root
self.transform = transform

def setup(self, stage: Optional[str] = None):
dataset = HKHGlacierMapping(root=self.root, transform=self.transform)
self.train_dataset, self.val_dataset, self.test_dataset = dataset_split(
dataset, val_pct=self.val_split, test_pct=self.test_split
)

0 comments on commit e4b14f7

Please sign in to comment.