Skip to content

Commit 811f298

Browse files
authored
Merge pull request #105 from developmentseed/deprecate-skynet-data
Deprecate skynet data
2 parents e00ad6a + 09923fb commit 811f298

File tree

4 files changed

+80
-1
lines changed

4 files changed

+80
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Label Maker
22
## Data Preparation for Satellite Machine Learning
33

4-
The tool downloads [OpenStreetMap QA Tile]((https://osmlab.github.io/osm-qa-tiles/)) information and satellite imagery tiles and saves them as an [`.npz` file](https://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html) for use in Machine Learning training.
4+
The tool downloads [OpenStreetMap QA Tile]((https://osmlab.github.io/osm-qa-tiles/)) information and satellite imagery tiles and saves them as an [`.npz` file](https://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html) for use in machine learning training.
55

66
![example classification image overlaid over satellite imagery](examples/images/classification.png)
77
_satellite imagery from [Mapbox](https://www.mapbox.com/) and [Digital Globe](https://www.digitalglobe.com/)_

examples/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
- [Creating a Neural Network to Find Populated Areas in Tanzania](walkthrough-classification-aws.md): Build a classifier using Keras on AWS
66
- [Creating a building classifier in Vietnam using MXNet and SageMaker](walkthrough-classification-mxnet-sagemaker.md): Build a classifier on AWS SageMaker
77
- [A building detector with TensorFlow API](walkthrough-tensorflow-object-detection.md): Use the TensorFlow Object Detection API for detecting buildings in Mexico City.
8+
- [Preparing data for `skynet-train`](skynet-train-data-prep.md)
89

910
## Example Nets
1011

examples/skynet-train-data-prep.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Using `label-maker` with `skynet-train`
2+
3+
## Background
4+
5+
[`skynet-data`](https://github.com/developmentseed/skynet-data/) is a tool developed specifically to prepare data for [`skynet-train`]((https://github.com/developmentseed/skynet-train/)), an implementation of [SegNet](http://mi.eng.cam.ac.uk/projects/segnet/). `skynet-data` predates `label-maker` and prepares data in a very similar way: download OpenStreetMap data and satellite imagery tiles for use in Machine Learning training. Eventually, `skynet-data` will be deprecated as most of it's functionality can be replicated using `label-maker`.
6+
7+
## Preparing data
8+
9+
`skynet-train` requires a few separate files specific to [`caffe`](https://github.com/BVLC/caffe). To create these files, we've created a [utility script](utils/skynet.py) to help connect `label-maker` with [`skynet-train`](https://github.com/developmentseed/skynet-train/). First, prepare segmentation labels and images with `label-maker` by running `download`, `labels`, and `images` from the command line, following instructions from the [other examples](README.md) or the [README](../README.md). Then, in your data folder (the script uses relative paths), run:
10+
11+
```bash
12+
python utils/segnet.py
13+
```
14+
15+
This should create the files (`train.txt`, `val.txt`, and `label-stats.csv`) which are needed for running `skynet-train`
16+
17+
## Training
18+
19+
Now you can mount your data folder as shown in the [`skynet-train` instructions](https://github.com/developmentseed/skynet-train/#quick-start) and training should begin.

examples/utils/skynet.py

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
from os import makedirs, path as op
2+
from shutil import copytree
3+
from collections import Counter
4+
import csv
5+
6+
import numpy as np
7+
from PIL import Image
8+
9+
# create a greyscale folder for class labelled images
10+
greyscale_folder = op.join('labels', 'grayscale')
11+
if not op.isdir(greyscale_folder):
12+
makedirs(greyscale_folder)
13+
labels = np.load('labels.npz')
14+
15+
# write our numpy array labels to images
16+
# remove empty labels because we don't download images for them
17+
keys = labels.keys()
18+
class_freq = Counter()
19+
image_freq = Counter()
20+
for key in keys:
21+
label = labels[key]
22+
if np.sum(label):
23+
label_file = op.join(greyscale_folder, '{}.png'.format(key))
24+
img = Image.fromarray(label.astype(np.uint8))
25+
print('Writing {}'.format(label_file))
26+
img.save(label_file)
27+
# get class frequencies
28+
unique, counts = np.unique(label, return_counts=True)
29+
freq = dict(zip(unique, counts))
30+
for k, v in freq.items():
31+
class_freq[k] += v
32+
image_freq[k] += 1
33+
else:
34+
keys.remove(key)
35+
36+
# copy our tiles to a folder with a different name
37+
copytree('tiles', 'images')
38+
39+
# sample the file names and use those to create text files
40+
np.random.shuffle(keys)
41+
split_index = int(len(keys) * 0.8)
42+
43+
with open('train.txt', 'w') as train:
44+
for key in keys[:split_index]:
45+
train.write('/data/images/{}.png /data/labels/grayscale/{}.png\n'.format(key, key))
46+
47+
with open('val.txt', 'w') as val:
48+
for key in keys[split_index:]:
49+
val.write('/data/images/{}.png /data/labels/grayscale/{}.png\n'.format(key, key))
50+
51+
# write a csv with class frequencies
52+
freqs = [dict(label=k, frequency=v, image_count=image_freq[k]) for k, v in class_freq.items()]
53+
with open('labels/label-stats.csv', 'w') as stats:
54+
fieldnames = list(freqs[0].keys())
55+
writer = csv.DictWriter(stats, fieldnames=fieldnames)
56+
57+
writer.writeheader()
58+
for f in freqs:
59+
writer.writerow(f)

0 commit comments

Comments
 (0)