Merge pull request #105 from developmentseed/deprecate-skynet-data

drewbo · web-flow · commit 811f29843232 · 2018-08-17T11:20:33.000-04:00
Deprecate skynet data
diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
 # Label Maker
 ## Data Preparation for Satellite Machine Learning
 
-The tool downloads [OpenStreetMap QA Tile]((https://osmlab.github.io/osm-qa-tiles/)) information and satellite imagery tiles and saves them as an [`.npz` file](https://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html) for use in Machine Learning training.
+The tool downloads [OpenStreetMap QA Tile]((https://osmlab.github.io/osm-qa-tiles/)) information and satellite imagery tiles and saves them as an [`.npz` file](https://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html) for use in machine learning training.
 
 ![example classification image overlaid over satellite imagery](examples/images/classification.png)
 _satellite imagery from [Mapbox](https://www.mapbox.com/) and [Digital Globe](https://www.digitalglobe.com/)_
diff --git a/examples/README.md b/examples/README.md
@@ -5,6 +5,7 @@
 - [Creating a Neural Network to Find Populated Areas in Tanzania](walkthrough-classification-aws.md): Build a classifier using Keras on AWS
 - [Creating a building classifier in Vietnam using MXNet and SageMaker](walkthrough-classification-mxnet-sagemaker.md): Build a classifier on AWS SageMaker
 - [A building detector with TensorFlow API](walkthrough-tensorflow-object-detection.md): Use the TensorFlow Object Detection API for detecting buildings in Mexico City.
+- [Preparing data for `skynet-train`](skynet-train-data-prep.md)
 
 ## Example Nets
 
diff --git a/examples/skynet-train-data-prep.md b/examples/skynet-train-data-prep.md
@@ -0,0 +1,19 @@
+# Using `label-maker` with `skynet-train`
+
+## Background
+
+[`skynet-data`](https://github.com/developmentseed/skynet-data/) is a tool developed specifically to prepare data for [`skynet-train`]((https://github.com/developmentseed/skynet-train/)), an implementation of [SegNet](http://mi.eng.cam.ac.uk/projects/segnet/). `skynet-data` predates `label-maker` and prepares data in a very similar way: download OpenStreetMap data and satellite imagery tiles for use in Machine Learning training. Eventually, `skynet-data` will be deprecated as most of it's functionality can be replicated using `label-maker`.
+
+## Preparing data
+
+`skynet-train` requires a few separate files specific to [`caffe`](https://github.com/BVLC/caffe). To create these files, we've created a [utility script](utils/skynet.py) to help connect `label-maker` with [`skynet-train`](https://github.com/developmentseed/skynet-train/). First, prepare segmentation labels and images with `label-maker` by running `download`, `labels`, and `images` from the command line, following instructions from the [other examples](README.md) or the [README](../README.md). Then, in your data folder (the script uses relative paths), run:
+
+```bash
+python utils/segnet.py
+```
+
+This should create the files (`train.txt`, `val.txt`, and `label-stats.csv`) which are needed for running `skynet-train`
+
+## Training
+
+Now you can mount your data folder as shown in the [`skynet-train` instructions](https://github.com/developmentseed/skynet-train/#quick-start) and training should begin.
diff --git a/examples/utils/skynet.py b/examples/utils/skynet.py
@@ -0,0 +1,59 @@
+from os import makedirs, path as op
+from shutil import copytree
+from collections import Counter
+import csv
+
+import numpy as np
+from PIL import Image
+
+# create a greyscale folder for class labelled images
+greyscale_folder = op.join('labels', 'grayscale')
+if not op.isdir(greyscale_folder):
+    makedirs(greyscale_folder)
+labels = np.load('labels.npz')
+
+# write our numpy array labels to images
+# remove empty labels because we don't download images for them
+keys = labels.keys()
+class_freq = Counter()
+image_freq = Counter()
+for key in keys:
+    label = labels[key]
+    if np.sum(label):
+        label_file = op.join(greyscale_folder, '{}.png'.format(key))
+        img = Image.fromarray(label.astype(np.uint8))
+        print('Writing {}'.format(label_file))
+        img.save(label_file)
+        # get class frequencies
+        unique, counts = np.unique(label, return_counts=True)
+        freq = dict(zip(unique, counts))
+        for k, v in freq.items():
+            class_freq[k] += v
+            image_freq[k] += 1
+    else:
+        keys.remove(key)
+
+# copy our tiles to a folder with a different name
+copytree('tiles', 'images')
+
+# sample the file names and use those to create text files
+np.random.shuffle(keys)
+split_index = int(len(keys) * 0.8)
+
+with open('train.txt', 'w') as train:
+    for key in keys[:split_index]:
+        train.write('/data/images/{}.png /data/labels/grayscale/{}.png\n'.format(key, key))
+
+with open('val.txt', 'w') as val:
+    for key in keys[split_index:]:
+        val.write('/data/images/{}.png /data/labels/grayscale/{}.png\n'.format(key, key))
+
+# write a csv with class frequencies
+freqs = [dict(label=k, frequency=v, image_count=image_freq[k]) for k, v in class_freq.items()]
+with open('labels/label-stats.csv', 'w') as stats:
+    fieldnames = list(freqs[0].keys())
+    writer = csv.DictWriter(stats, fieldnames=fieldnames)
+
+    writer.writeheader()
+    for f in freqs:
+        writer.writerow(f)