added dataset and updated readme/requirements

YseraQin · Aug 7, 2021 · 1f1b75a · 1f1b75a
1 parent c1d9c13
commit 1f1b75a
Show file tree

Hide file tree

Showing 8 changed files with 109 additions and 6 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # PyTorch Remote Sensing (torchrs)
 
-(WIP) PyTorch implementation of popular datasets and models in remote sensing tasks (Change Detection, Image Super Resolution, Land Cover Classification/Segmentation, Image-to-Image Translation, Image Captioning, etc.) for various Optical (Sentinel-2, Landsat, etc.) and Synthetic Aperture Radar (SAR) (Sentinel-1) sensors.
+(WIP) PyTorch implementation of popular datasets and models in remote sensing tasks (Change Detection, Image Super Resolution, Land Cover Classification/Segmentation, Image Captioning, Audio-visual recognition etc.) for various Optical (Sentinel-2, Landsat, etc.) and Synthetic Aperture Radar (SAR) (Sentinel-1) sensors.
 
 ## Installation
 
@@ -28,7 +28,8 @@ pip install 'git+https://github.com/isaaccorley/torchrs.git#egg=torch-rs[train]'
 
 * [PROBA-V Multi-Image Super Resolution](https://github.com/isaaccorley/torchrs#proba-v-super-resolution)
 * [ETCI 2021 Flood Detection](https://github.com/isaaccorley/torchrs#etci-2021-flood-detection)
-* [FAIR1M Fine-grained Object Recognition](https://github.com/isaaccorley/torchrs#fair1m---fine-grained-object-recognition)
+* [FAIR1M - Fine-grained Object Recognition](https://github.com/isaaccorley/torchrs#fair1m---fine-grained-object-recognition)
+* [ADVANCE - Audiovisual Aerial Scene Recognition](https://github.com/isaaccorley/torchrs#advance---audiovisual-aerial-scene-recognition)
 * [OSCD - Onera Satellite Change Detection](https://github.com/isaaccorley/torchrs#onera-satellite-change-detection-oscd)
 * [S2Looking - Satellite Side-Looking Change Detection](https://github.com/isaaccorley/torchrs#satellite-side-looking-s2looking-change-detection)
 * [LEVIR-CD+ - LEVIR Change Detection+](https://github.com/isaaccorley/torchrs#levir-change-detection-levir-cd)
@@ -110,7 +111,7 @@ x: dict(
 
 <img src="./assets/fair1m.jpg" width="550px"></img>
 
-The [FAIR1M](https://rcdaudt.github.io/oscd/) dataset, proposed in ["FAIR1M: A Benchmark Dataset for Fine-grained Object Recognition in High-Resolution Remote Sensing Imagery", Sun et al.](https://arxiv.org/abs/2103.05569) is a fine-grained object recognition/detection dataset of 15,000 high resolution (0.3-0.8m) RGB images taken by the [Gaogen (GF)](https://earth.esa.int/web/eoportal/satellite-missions/g/gaofen-1) satellites and extracted from [Google Earth](https://earth.google.com/web/). The dataset contains rotated bounding boxes for objects of 5 categories (ships, vehicles, airplanes, courts, and roads) and 37 sub-categories. This dataset is a part of the [ISPRS Benchmark on Object Detection in High-Resolution Satellite Images](http://gaofen-challenge.com/benchmark). Note that so far only a portion of the training dataset has been released for the challenge (1,732/15,000 images).
+The [FAIR1M](http://gaofen-challenge.com/) dataset, proposed in ["FAIR1M: A Benchmark Dataset for Fine-grained Object Recognition in High-Resolution Remote Sensing Imagery", Sun et al.](https://arxiv.org/abs/2103.05569) is a fine-grained object recognition/detection dataset of 15,000 high resolution (0.3-0.8m) RGB images taken by the [Gaogen (GF)](https://earth.esa.int/web/eoportal/satellite-missions/g/gaofen-1) satellites and extracted from [Google Earth](https://earth.google.com/web/). The dataset contains rotated bounding boxes for objects of 5 categories (ships, vehicles, airplanes, courts, and roads) and 37 sub-categories. This dataset is a part of the [ISPRS Benchmark on Object Detection in High-Resolution Satellite Images](http://gaofen-challenge.com/benchmark). Note that so far only a portion of the training dataset has been released for the challenge (1,732/15,000 images).
 
 The dataset can be downloaded (8.7GB) using `scripts/download_fair1m.sh` and instantiated below:
 
@@ -137,6 +138,43 @@ where N is the number of objects in the image
 """
 ```
 
+### ADVANCE - Audiovisual Aerial Scene Recognition
+
+<img src="./assets/advance.png" width="700px"></img>
+
+The [AuDio Visual Aerial sceNe reCognition datasEt (ADVANCE)](https://akchen.github.io/ADVANCE-DATASET/) dataset, proposed in ["Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition", Hu et al.](https://arxiv.org/abs/2005.08449) is a dataset composed of 5,075 pairs of geotagged audio recordings and 512x512 RGB images extracted from [FreeSound](https://freesound.org/browse/geotags/?c_lat=24&c_lon=20&z=2) and [Google Earth](https://earth.google.com/web/), respectively. The images are then labeled into 13 scene categories using [OpenStreetMap](https://www.openstreetmap.org/#map=5/38.007/-95.844).
+
+The dataset can be downloaded (4.5GB) using `scripts/download_advance.sh` and instantiated below:
+
+```python
+import torchvision.transforms as T
+from torchrs.datasets import ADVANCE
+
+image_transform = T.Compose([T.ToTensor()])
+audio_transform = T.Compose([])
+
+dataset = ADVANCE(
+    root="path/to/dataset/",
+    image_transform=image_transform,
+    audio_transform=audio_transform,
+)
+
+x = dataset[0]
+"""
+x: dict(
+    image: (3, 512, 512)
+    audio: (1, 220500)
+    cls: int
+)
+"""
+
+dataset.classes
+"""
+['airport', 'beach', 'bridge', 'farmland', 'forest', 'grassland', 'harbour', 'lake',
+'orchard', 'residential', 'sparse shrub land', 'sports land', 'train station']
+"""
+```
+
 ### Onera Satellite Change Detection (OSCD)
 
 <img src="./assets/oscd.png" width="750px"></img>

diff --git a/assets/advance.png b/assets/advance.png
diff --git a/requirements.txt b/requirements.txt
@@ -1,5 +1,6 @@
 torch
 torchvision
+torchaudio
 einops
 numpy
 pillow

diff --git a/scripts/download_advance.sh b/scripts/download_advance.sh
@@ -0,0 +1,7 @@
+mkdir -p .data/advance
+wget --no-check-certificate https://zenodo.org/record/3828124/files/ADVANCE_vision.zip?download=1 -O ADVANCE_vision.zip
+wget --no-check-certificate https://zenodo.org/record/3828124/files/ADVANCE_sound.zip?download=1 -O ADVANCE_sound.zip
+unzip ADVANCE_vision.zip -d .data/advance/
+rm ADVANCE_vision.zip
+unzip ADVANCE_sound.zip -d .data/advance/
+rm ADVANCE_sound.zip
diff --git a/torchrs/datasets/__init__.py b/torchrs/datasets/__init__.py
@@ -11,10 +11,11 @@
 from .sydney_captions import SydneyCaptions
 from .ucm_captions import UCMCaptions
 from .s2mtcp import S2MTCP
+from .advance import ADVANCE
 
 
 __all__ = [
     "PROBAV", "ETCI2021", "RSVQALR", "RSVQAxBEN", "EuroSATRGB", "EuroSATMS",
     "RESISC45", "RSICD", "OSCD", "S2Looking", "LEVIRCDPlus", "FAIR1M",
-    "SydneyCaptions", "UCMCaptions", "S2MTCP"
+    "SydneyCaptions", "UCMCaptions", "S2MTCP", "ADVANCE"
 ]
diff --git a/torchrs/datasets/advance.py b/torchrs/datasets/advance.py
@@ -0,0 +1,55 @@
+import os
+from glob import glob
+from typing import List, Dict
+
+import torch
+import torchaudio
+import numpy as np
+import torchvision.transforms as T
+from PIL import Image
+
+
+class ADVANCE(torch.utils.data.Dataset):
+    """ AuDio Visual Aerial sceNe reCognition datasEt (ADVANCE) from
+    'Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition', Hu et al. (2020)
+    https://arxiv.org/abs/2005.08449
+
+    'We create an annotated dataset consisting of 5075 geotagged aerial imagesound pairs
+    involving 13 scene classes. This dataset covers a large variety of scenes from across
+    the world'
+    """
+    def __init__(
+        self,
+        root: str = ".data/advance",
+        image_transform: T.Compose = T.Compose([T.ToTensor()]),
+        audio_transform: T.Compose = T.Compose([]),
+    ):
+        self.root = root
+        self.image_transform = image_transform
+        self.audio_transform = audio_transform
+        self.files = self.load_files(root)
+        self.classes = sorted(set(f["cls"] for f in self.files))
+
+    @staticmethod
+    def load_files(root: str) -> List[Dict]:
+        images = sorted(glob(os.path.join(root, "vision", "**", "*.jpg")))
+        wavs = sorted(glob(os.path.join(root, "sound", "**", "*.wav")))
+        labels = [image.split(os.sep)[-2] for image in images]
+        files = [dict(image=image, audio=wav, cls=label) for image, wav, label in zip(images, wavs, labels)]
+        return files
+
+    def __len__(self) -> int:
+        return len(self.files)
+
+    def __getitem__(self, idx: int) -> Dict:
+        """ Returns a dict containing image, audio, and class label
+        image: (3, 512, 512)
+        audio: (1, 220500)
+        cls: int
+        """
+        files = self.files[idx]
+        image = np.array(Image.open(files["image"]).convert("RGB"))
+        audio, fs = torchaudio.load(files["audio"])
+        image = self.image_transform(image)
+        audio = self.audio_transform(audio)
+        return dict(image=image, audio=audio, cls=files["cls"])
diff --git a/torchrs/datasets/s2mtcp.py b/torchrs/datasets/s2mtcp.py
@@ -32,7 +32,7 @@ def __init__(
         self.files = self.load_files(self.root)
 
     @staticmethod
-    def load_files(root: str) -> List[Dict]: 
+    def load_files(root: str) -> List[Dict]:
         files = glob(os.path.join(root, "*.npy"))
         files = [os.path.basename(f).split("_")[0] for f in files]
         files = sorted(set(files), key=int)

diff --git a/torchrs/transforms.py b/torchrs/transforms.py
@@ -33,7 +33,8 @@ def __call__(self, x: np.ndarray) -> torch.Tensor:
         if x.dtype == "uint16":
             x = x.astype("int32")
 
-        x = torch.from_numpy(x)
+        if isinstance(x, np.ndarray):
+            x = torch.from_numpy(x)
 
         if x.ndim == 2:
             if self.permute_dims: