update

YseraQin · Jul 26, 2021 · 46a649d · 46a649d
1 parent 6271e9b
commit 46a649d
Show file tree

Hide file tree

Showing 9 changed files with 152 additions and 37 deletions.
diff --git a/README.md b/README.md
@@ -1,12 +1,15 @@
-# PyTorch Remote Sensing
+# PyTorch Remote Sensing (torchrs)
+
 (WIP) PyTorch implementation of popular datasets and models in remote sensing tasks (Change Detection, Image Super Resolution, Land Cover Classification/Segmentation, Image-to-Image Translation, etc.) for various Optical (Sentinel-2, Landsat, etc.) and Synthetic Aperture Radar (SAR) (Sentinel-1) sensors.
 
 ## Installation
-```
+
+```bash
 pip install git+https://github.com/isaaccorley/torchrs
 ```
 
 ## Table of Contents
+
 * [Datasets](https://github.com/isaaccorley/torchrs#datasets)
 * [Models](https://github.com/isaaccorley/torchrs#models)
 
@@ -15,12 +18,15 @@ pip install git+https://github.com/isaaccorley/torchrs
 * [PROBA-V Super Resolution](https://github.com/isaaccorley/torchrs#proba-v-super-resolution)
 * [ETCI 2021 Flood Detection](https://github.com/isaaccorley/torchrs#etci-2021-flood-detection)
 * [Remote Sensing Visual Question Answering (RSVQA) Low Resolution (LR)](https://github.com/isaaccorley/torchrs#remote-sensing-visual-question-answering-rsvqa-low-resolution-lr)
+* [Remote Sensing Image Captioning Dataset (RSICD)](https://github.com/isaaccorley/torchrs#remote-sensing-image-captioning-dataset-rsicd)
+* [Remote Sensing Image Scene Classification (RESISC45)](https://github.com/isaaccorley/torchrs#remote-sensing-image-scene-classification-resisc45)
+* [EuroSAT](https://github.com/isaaccorley/torchrs#eurosat)
 
 ### PROBA-V Super Resolution
 
 <img src="./assets/proba-v.jpg" width="500px"></img>
 
-The [PROBA-V Super Resolution Challenge Dataset](https://kelvins.esa.int/proba-v-super-resolution/home/) is a Multi-image Super Resolution (MISR) dataset of images taken by the [ESA PROBA-Vegetation satellite](https://earth.esa.int/eogateway/missions/proba-v). The dataset contains sets of unregistered 300m low resolution (LR) images which can be used to generate single 100m high resolution (HR) images for both Near Infrared (NIR) and Red bands. In addition, Quality Masks (QM) for each LR image and Status Masks (SM) for each HR image are available. The PROBA-V contains sensors which take imagery at 100m and 300m spatial resolutions with 5 and 1 day revisit rates, respectively. Generating high resolution imagery estimates would effectively increase the frequency at which HR imagery is available for vegetation monitoring.
+The [PROBA-V Super Resolution Challenge](https://kelvins.esa.int/proba-v-super-resolution/home/) dataset is a Multi-image Super Resolution (MISR) dataset of images taken by the [ESA PROBA-Vegetation satellite](https://earth.esa.int/eogateway/missions/proba-v). The dataset contains sets of unregistered 300m low resolution (LR) images which can be used to generate single 100m high resolution (HR) images for both Near Infrared (NIR) and Red bands. In addition, Quality Masks (QM) for each LR image and Status Masks (SM) for each HR image are available. The PROBA-V contains sensors which take imagery at 100m and 300m spatial resolutions with 5 and 1 day revisit rates, respectively. Generating high resolution imagery estimates would effectively increase the frequency at which HR imagery is available for vegetation monitoring.
 
 The dataset can be downloaded using the `scripts/download_probav.sh` script and then used as below:
 
@@ -52,7 +58,7 @@ t varies by set of images (minimum of 9)
 
 ### ETCI 2021 Flood Detection
 
-<img src="./assets/etci2021.jpg" width="500px"></img>
+<img src="./assets/etci2021.jpg" width="450px"></img>
 
 The [ETCI 2021 Dataset](https://nasa-impact.github.io/etci2021/) is a Flood Detection segmentation dataset of SAR images taken by the [ESA Sentinel-1 satellite](https://sentinel.esa.int/web/sentinel/missions/sentinel-1). The dataset contains pairs of VV and VH polarization images processed by the Hybrid Pluggable Processing Pipeline (hyp3) along with corresponding binary flood and water body ground truth masks.
 
@@ -90,10 +96,10 @@ The [RSVQA LR](https://rsvqa.sylvainlobry.com/) dataset, proposed in ["RSVQA: Vi
 The dataset can be downloaded using the `scripts/download_rsvqa_lr.sh` script and then used as below:
 
 ```python
-from torchrs.transforms import Compose, ToTensor
+import torchvision.transforms as T
 from torchrs.datasets import RSVQALR
 
-transform = Compose([ToTensor()])
+transform = T.Compose([T.ToTensor()])
 
 dataset = RSVQALR(
     root="path/to/dataset/",
@@ -112,6 +118,109 @@ x: dict(
 """
 ```
 
+### Remote Sensing Image Captioning Dataset (RSICD)
+
+<img src="./assets/rsicd.png" width="500px"></img>
+
+The [RSICD](https://github.com/201528014227051/RSICD_optimal) dataset, proposed in ["Exploring Models and Data for Remote Sensing Image Caption Generation", Lu et al.](https://arxiv.org/abs/1712.07835) is an image captioning dataset with 5 captions per image for 10,921 RGB images extracted using [Google Earth](https://earth.google.com/web/), [Baidu Map](https://map.baidu.com/), [MapABC](https://www.mapabc.com/) and [Tianditu](https://www.tianditu.gov.cn/). This dataset contains 5 captions per image. While one of the larger remote sensing image captioning datasets, this dataset contains very repetitive language with little detail and many captions are duplicated.
+
+The dataset can be downloaded using the `scripts/download_rsicd.sh` script and then used as below:
+
+```python
+import torchvision.transforms as T
+from torchrs.datasets import RSICD
+
+transform = T.Compose([T.ToTensor()])
+
+dataset = RSICD(
+    root="path/to/dataset/",
+    split="train",
+    transform=transform
+)
+
+x, y = dataset[0]
+"""
+x: (3, 224, 224)
+captions: List[str]
+"""
+```
+
+### Remote Sensing Image Scene Classification (RESISC45)
+
+<img src="./assets/resisc45.png" width="500px"></img>
+
+The [RESISC45](http://www.escience.cn/people/JunweiHan/NWPU-RESISC45.html) dataset, proposed in ["Remote Sensing Image Scene Classification: Benchmark and State of the Art", Cheng et al.](https://arxiv.org/abs/1703.00121) is an image classification dataset of 31,500 RGB images extracted using [Google Earth Engine](https://earthengine.google.com/). The dataset contains 45 scenes with 700 images per class from over 100 countries and was selected to optimize for high variability in image conditions (spatial resolution, occlusion, weather, illumination, etc.).
+
+The dataset can be downloaded using the `scripts/download_resisc45.sh` script and then used as below:
+
+```python
+import torchvision.transforms as T
+from torchrs.datasets import RESISC45
+
+transform = T.Compose([T.ToTensor()])
+
+dataset = RESISC45(
+    root="path/to/dataset/",
+    transform=transform
+)
+
+x, y = dataset[0]
+"""
+x: (3, 256, 256)
+y: int
+"""
+
+dataset.classes
+"""
+['airplane', 'airport', 'baseball_diamond', 'basketball_court', 'beach', 'bridge', 'chaparral', 'church', 'circular_farmland', 'cloud', 'commercial_area', 'dense_residential', 'desert', 'forest', 'freeway', 'golf_course', 'ground_track_field', 'harbor', 'industrial_area', 'intersection', 'island', 'lake', 'meadow', 'medium_residential', 'mobile_home_park', 'mountain', 'overpass', 'palace', 'parking_lot', 'railway', 'railway_station', 'rectangular_farmland', 'river', 'roundabout', 'runway', 'sea_ice', 'ship', 'snowberg', 'sparse_residential', 'stadium', 'storage_tank', 'tennis_court', 'terrace', 'thermal_power_station', 'wetland']
+"""
+```
+
+### EuroSAT
+
+<img src="./assets/eurosat.jpg" width="600px"></img>
+
+The [EuroSAT](https://github.com/phelber/eurosat) dataset, proposed in ["EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification", Helber et al.](https://arxiv.org/abs/1709.00029) is a land cover classification dataset of 27,000 images taken by the [ESA Sentinel-2 satellite](https://sentinel.esa.int/web/sentinel/missions/sentinel-2). The dataset contains 10 land cover classes with 2-3k images per class from over 34 European countries. The dataset is available in the form of RGB only or all [Multispectral (MS) Sentinel-2 bands](https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-2-msi/resolutions/spatial). This dataset is fairly easy with ~98.6% accuracy achieved with a ResNet-50.
+
+The dataset can be downloaded using the `scripts/download_eurosat_rgb.sh` and `scripts/download_eurosat_ms.sh` scripts and then used as below:
+
+```python
+import torchvision.transforms as T
+from torchrs.transforms import ToTensor
+from torchrs.datasets import EuroSATRGB, EuroSATMS
+
+transform = T.Compose([T.ToTensor()])
+
+dataset = EuroSATRGB(
+    root="path/to/dataset/",
+    transform=transform
+)
+
+x, y = dataset[0]
+"""
+x: (3, 64, 64)
+y: int
+"""
+
+transform = T.Compose([ToTensor()])
+
+dataset = EuroSATMS(
+    root="path/to/dataset/",
+    transform=transform
+)
+
+x, y = dataset[0]
+"""
+x: (13, 64, 64)
+y: int
+"""
+
+dataset.classes
+"""
+['AnnualCrop', 'Forest', 'HerbaceousVegetation', 'Highway', 'Industrial', 'Pasture', 'PermanentCrop', 'Residential', 'River', 'SeaLake']
+"""
+```
+
 ## Models
 
 * [RAMS](https://github.com/isaaccorley/torchrs#rams)

diff --git a/assets/resisc45.png b/assets/resisc45.png
diff --git a/scripts/download_resisc45.sh b/scripts/download_resisc45.sh
@@ -2,7 +2,6 @@
 # uploaded to gdrive for downloading in a script
 pip install gdown
 apt-get install unrar
-mkdir -p .data/resisc45
 gdown --id 1DnPSU5nVSN7xv95bpZ3XQ0JhKXZOKgIv
 unrar x NWPU-RESISC45.rar .data/
 rm NWPU-RESISC45.rar
diff --git a/torchrs/datasets/__init__.py b/torchrs/datasets/__init__.py
@@ -4,3 +4,5 @@
 from .eurosat import EuroSATRGB, EuroSATMS
 from .resisc45 import RESISC45
 from .rsicd import RSICD
+
+__all__ = ["PROBAV", "ETCI2021", "RSVQALR", "EuroSATRGB", "EuroSATMS", "RESISC45", "RSICD"]
diff --git a/torchrs/datasets/eurosat.py b/torchrs/datasets/eurosat.py
@@ -1,29 +1,34 @@
 import os
 
+import tifffile
 import torchvision.transforms as T
 from torchvision.datasets import ImageFolder
 
+from torchrs.transforms import ToTensor
+
 
 class EuroSATRGB(ImageFolder):
 
     def __init__(
         self,
-        root: str,
-        transforms: T.Compose
+        root: str = ".data/eurosat-rgb",
+        transform: T.Compose = T.Compose([T.ToTensor()])
     ):
         super().__init__(
             root=os.path.join(root, "2750"),
-            transform=transforms
+            transform=transform
         )
 
+
 class EuroSATMS(ImageFolder):
 
     def __init__(
         self,
-        root: str,
-        transforms: T.Compose
+        root: str = ".data/eurosat-ms",
+        transform: T.Compose = T.Compose([ToTensor()])
     ):
         super().__init__(
             root=os.path.join(root, "ds/images/remote_sensing/otherDatasets/sentinel_2/tif"),
-            transform=transforms
+            transform=transform,
+            loader=tifffile.imread
         )
diff --git a/torchrs/datasets/resisc45.py b/torchrs/datasets/resisc45.py
@@ -6,10 +6,10 @@ class RESISC45(ImageFolder):
 
     def __init__(
         self,
-        root: str,
-        transforms: T.Compose
+        root: str = ".data/NWPU-RESISC45",
+        transform: T.Compose = T.Compose([T.ToTensor()])
     ):
         super().__init__(
             root=root,
-            transform=transforms
+            transform=transform
         )
diff --git a/torchrs/datasets/rsicd.py b/torchrs/datasets/rsicd.py
@@ -1,44 +1,38 @@
 import os
 import json
-from typing import List, Dict, Optional
+from typing import List, Dict
 
 import torch
 import torchvision.transforms as T
 from PIL import Image
 
-from torchrs.transforms import ToTensor
-
 
 class RSICD(torch.utils.data.Dataset):
 
     def __init__(
         self,
-        root: str = ".data/rscid",
-        annotations_path: Optional[str] = None,
+        root: str = ".data/rsicd",
         split: str = "train",
-        transforms: T.Compose = T.Compose([ToTensor()])
+        transform: T.Compose = T.Compose([T.ToTensor()])
     ):
         assert split in ["train", "val", "test"]
         self.root = root
-        self.transforms = transforms
-
-
-        self.annotations = self.load_annotations(annotations_path, split)
-        print(f"RSICD {split} dataset loaded with {len(self.annotations)} annotations")
+        self.transform = transform
+        self.captions = self.load_captions(os.path.join(root, "dataset_rsicd.json"), split)
 
-    def load_annotations(self, path: str, split: str) -> List[Dict]:
+    @staticmethod
+    def load_captions(path: str, split: str) -> List[Dict]:
         with open(path) as f:
-            annotations = json.load(f)["images"]
-
-        return [a for a in annotations if a["split"] == split]
+            captions = json.load(f)["images"]
+        return [c for c in captions if c["split"] == split]
 
     def __len__(self) -> int:
-        return len(self.annotations)
+        return len(self.captions)
 
     def __getitem__(self, idx: int) -> Dict:
-        annotation = self.annotations[idx]
-        path = os.path.join(self.root, annotation["filename"])
+        captions = self.captions[idx]
+        path = os.path.join(self.root, "RSICD_images", captions["filename"])
         x = Image.open(path).convert("RGB")
-        x = self.transforms(x)
-        captions = [sentence["raw"] for sentence in annotation["sentences"]]
-        return dict(x=x, captions=captions)
+        x = self.transform(x)
+        sentences = [sentence["raw"] for sentence in captions["sentences"]]
+        return dict(x=x, captions=sentences)
diff --git a/torchrs/models/__init__.py b/torchrs/models/__init__.py
@@ -1 +1,3 @@
 from .rams import RAMS
+
+__all__ = ["RAMS"]
diff --git a/torchrs/transforms.py b/torchrs/transforms.py
@@ -29,6 +29,10 @@ def __init__(self, permute_dims: bool = True):
         self.permute_dims = permute_dims
 
     def __call__(self, x: np.ndarray) -> torch.Tensor:
+
+        if x.dtype == "uint16":
+            x = x.astype("int32")
+
         x = torch.from_numpy(x)
 
         if x.ndim == 2: