Skip to content

Deprecate skynet data #105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 17, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Label Maker
## Data Preparation for Satellite Machine Learning

The tool downloads [OpenStreetMap QA Tile]((https://osmlab.github.io/osm-qa-tiles/)) information and satellite imagery tiles and saves them as an [`.npz` file](https://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html) for use in Machine Learning training.
The tool downloads [OpenStreetMap QA Tile]((https://osmlab.github.io/osm-qa-tiles/)) information and satellite imagery tiles and saves them as an [`.npz` file](https://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html) for use in machine learning training.

![example classification image overlaid over satellite imagery](examples/images/classification.png)
_satellite imagery from [Mapbox](https://www.mapbox.com/) and [Digital Globe](https://www.digitalglobe.com/)_
Expand Down
1 change: 1 addition & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
- [Creating a Neural Network to Find Populated Areas in Tanzania](walkthrough-classification-aws.md): Build a classifier using Keras on AWS
- [Creating a building classifier in Vietnam using MXNet and SageMaker](walkthrough-classification-mxnet-sagemaker.md): Build a classifier on AWS SageMaker
- [A building detector with TensorFlow API](walkthrough-tensorflow-object-detection.md): Use the TensorFlow Object Detection API for detecting buildings in Mexico City.
- [Preparing data for `skynet-train`](skynet-train-data-prep.md)

## Example Nets

Expand Down
19 changes: 19 additions & 0 deletions examples/skynet-train-data-prep.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Using `label-maker` with `skynet-train`

## Background

[`skynet-data`](https://github.com/developmentseed/skynet-data/) is a tool developed specifically to prepare data for [`skynet-train`]((https://github.com/developmentseed/skynet-train/)), an implementation of [SegNet](http://mi.eng.cam.ac.uk/projects/segnet/). `skynet-data` predates `label-maker` and prepares data in a very similar way: download OpenStreetMap data and satellite imagery tiles for use in Machine Learning training. Eventually, `skynet-data` will be deprecated as most of it's functionality can be replicated using `label-maker`.

## Preparing data

`skynet-train` requires a few separate files specific to [`caffe`](https://github.com/BVLC/caffe). To create these files, we've created a [utility script](utils/skynet.py) to help connect `label-maker` with [`skynet-train`](https://github.com/developmentseed/skynet-train/). First, prepare segmentation labels and images with `label-maker` by running `download`, `labels`, and `images` from the command line, following instructions from the [other examples](README.md) or the [README](../README.md). Then, in your data folder (the script uses relative paths), run:

```bash
python utils/segnet.py
```

This should create the files (`train.txt`, `val.txt`, and `label-stats.csv`) which are needed for running `skynet-train`

## Training

Now you can mount your data folder as shown in the [`skynet-train` instructions](https://github.com/developmentseed/skynet-train/#quick-start) and training should begin.
59 changes: 59 additions & 0 deletions examples/utils/skynet.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
from os import makedirs, path as op
from shutil import copytree
from collections import Counter
import csv

import numpy as np
from PIL import Image

# create a greyscale folder for class labelled images
greyscale_folder = op.join('labels', 'grayscale')
if not op.isdir(greyscale_folder):
makedirs(greyscale_folder)
labels = np.load('labels.npz')

# write our numpy array labels to images
# remove empty labels because we don't download images for them
keys = labels.keys()
class_freq = Counter()
image_freq = Counter()
for key in keys:
label = labels[key]
if np.sum(label):
label_file = op.join(greyscale_folder, '{}.png'.format(key))
img = Image.fromarray(label.astype(np.uint8))
print('Writing {}'.format(label_file))
img.save(label_file)
# get class frequencies
unique, counts = np.unique(label, return_counts=True)
freq = dict(zip(unique, counts))
for k, v in freq.items():
class_freq[k] += v
image_freq[k] += 1
else:
keys.remove(key)

# copy our tiles to a folder with a different name
copytree('tiles', 'images')

# sample the file names and use those to create text files
np.random.shuffle(keys)
split_index = int(len(keys) * 0.8)

with open('train.txt', 'w') as train:
for key in keys[:split_index]:
train.write('/data/images/{}.png /data/labels/grayscale/{}.png\n'.format(key, key))

with open('val.txt', 'w') as val:
for key in keys[split_index:]:
val.write('/data/images/{}.png /data/labels/grayscale/{}.png\n'.format(key, key))

# write a csv with class frequencies
freqs = [dict(label=k, frequency=v, image_count=image_freq[k]) for k, v in class_freq.items()]
with open('labels/label-stats.csv', 'w') as stats:
fieldnames = list(freqs[0].keys())
writer = csv.DictWriter(stats, fieldnames=fieldnames)

writer.writeheader()
for f in freqs:
writer.writerow(f)