GitHub - erikbjohn/alabama_roads: Project to use streetview to score road safety in Alabama.

erikbjohn / alabama_roads Public
Notifications You must be signed in to change notification settings
Fork 3
Star 2
Project to use streetview to score road safety in Alabama.
Notifications
Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Images		Images
R		R
.gitignore		.gitignore
README.pdf		README.pdf
README.rmd		README.rmd
README.tex		README.tex
alabama_roads.Rproj		alabama_roads.Rproj
neural_network.png		neural_network.png
Repository files navigation

---
title: "Deep Learning with Big Data: \\newline Alabama Highway Infrastructure"
author: "Erik Johnson and Alexander Hainen"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output: 
  beamer_presentation:
    keep_tex: true
---

```{r setup, include=FALSE, echo=FALSE}
library(knitr)
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(tidy.opts=list(width.cutoff=45),tidy=TRUE)
library(rgdal)
library(streetview) # devtools::install_github('erikbjohn/streetview')
library(api.keys) # devtools::install_github('erikbjohn/api.keys')
library(ggplot2)
library(geosphere)
library(rgeos)
library(ggmap)
library(kableExtra)
library(data.table)

# Set location for local package data storage
pkg.data.location <- '~/Dropbox/pkg.data/alabama_roads/'

## Api Key load
api.key.location <- '~/Dropbox/pkg.data/api.keys/raw/l.pkg.rdata'
load(api.key.location)
api.key <- l.pkg$google

## Auto set paths
source('R/sample.line.R')
source('R/streetview.metadata.R')
source('R/tf_classification.R')
proj.env <- '+proj=longlat +datum=WGS84'
panoids.location <- paste0(pkg.data.location, 'clean/pano_ids.rds')
roads.raw.location <- paste0(pkg.data.location, 'raw/us82erik/us82_dissolve')
roads.clean.location <- paste0(pkg.data.location, 'clean/roads.rds')
road.points.location <- paste0(pkg.data.location, 'clean/roads.lines.rds')
road.points.panoid.location <- paste0(pkg.data.location, 'raw/road.points.panoids/')
snapshots.location <- paste0(pkg.data.location, 'raw/snapshots/')
```

# 

```{r, echo=FALSE, eval=TRUE, cache=TRUE}
knitr::include_graphics(xtable::sanitize(paste0(path.expand(pkg.data.location), 'raw/training_data/guardrail/22guardrail.jpg'), type='latex'))
```

# 

```{r, echo=FALSE, eval=TRUE}
knitr::include_graphics(xtable::sanitize(paste0(path.expand(pkg.data.location), 'raw/training_data/signals/09signals.jpg'), type='latex'))
```

# What is big data?

* Images and Video are big data
    1. Need to explore quickly and accurately
    2. Possible efficiency gains (better questions, better answers)
    3. In globally competitive environment huge gains to first movers.
* Big data source examples:
    1. Google Streetview: Road infrastructure detection (guardrails, traffic lights, rumble strips, lane information) 
    2. Drones: Storm Damage assessment (tree mapping, structure changes, power line status)
    3. Traffic cameras: Traffic congestion and accident identification from traffic cameras.
    4. Crowdsourced instagram photos, etc.
    
# Big data challenges:

* Costly to understand the meaning and information in the pictures.
    1. How to classify? (students/employees, etc.)
    2. Prone to human error and missing information.
    3. If new category is added must go back and reclassify all previous images.
    4. Need real time abilities (24 hours/day 7 days/week)
* Solution:
    1. Automate using rapidly advancing deep learning algorithms.

# Application

This application focuses on using big data and deep learning to cheaply and efficiently classify infrastructure on Alabama highway 82 between Tuscaloosa and Montgomery.

This process will illustrate the following:

1. How to build a big data set using free/low-cost data
    * Publicly available road shapefiles.
    * Google Streetview images
2. How to train and classify images using free and simple algorithms
    * Google Tensorflow
    * Inception v3 algorithm and retraining
    
# Start with any freely available road shapefile.
    * Here, we use Alabama Highway 82 

```{r roads_import, eval=TRUE, echo=FALSE, cache=TRUE, out.width = "300px", warning=FALSE}
if(!(file.exists(roads.clean.location))){
  roads <- rgdal::readOGR(dsn = path.expand(roads.raw.location), layer = 'us82d', verbose = FALSE)
  roads <- sp::spTransform(roads, CRSobj =CRS(proj.env))
  saveRDS(roads, file = roads.clean.location)
} else {
  roads <- readRDS(roads.clean.location)
}
# Plot
map.center <- as.numeric(geosphere::centroid(rgeos::gBuffer(roads, width=0.1)))
map <- suppressMessages(ggmap::get_googlemap(center=c(lon=map.center[1], lat=map.center[2]), zoom=8))
roads_fortify <- ggplot2::fortify(roads)
p <- ggmap(map)  +
  geom_line(data=roads_fortify, aes(x=long, y=lat, group=group))
suppressMessages(ggsave(p, filename = 'Images/highway.pdf'))
knitr::include_graphics('Images/highway.pdf')
```

# Find all google streetview snapshots locations
```{r road.points, eval=TRUE, echo=FALSE, cache=TRUE}
# Find location of all google street view camera snapshots
if (!(file.exists(road.points.location))){
road.points <- suppressWarnings(sample.line(roads, sdist=0.0075))##
road.points$long <- road.points@coords[,1]
road.points$lat <- road.points@coords[,2]
saveRDS(road.points, file=road.points.location)
} else {
  road.points <- readRDS(road.points.location)
}
#knitr::kable(head(road.points@data))
```
```{r dt_pano_ids, eval=TRUE, echo=FALSE, cache=TRUE}
# Convert points to panoids and assign camera bearings

# This can be automated in the future to adjust the sdist parameter in the road.points chunk to optimally collect all panoids while minimizing the number of repeats samples. This will speed up the meta_data api calls. For now we will just fill in the data data with the `r prettyNum(length(road.points))` road points that we # have sampled. Two camera bearings are assigned for each panoid. This is based on aiming the camera at the previous and next pano_id. Order is based on the roads.points file and retained in the field **dt_pano_ids\\$pano_id_order**.

if (!(file.exists(panoids.location))){
  points.ids.full <- road.points$ID
  points.files <- list.files(road.points.panoid.location)
  points.ids.done <- sapply(points.files,
                            function(x) stringr::str_extract(x,
                                                             stringr::regex('(?<=id\\:).+(?=\\;)', perl=TRUE)))
  points.ids.not.done <- points.ids.full[!(points.ids.full %in% points.ids.done)]
  
  while(length(points.ids.not.done)>0){
    if (length(points.ids.not.done)>1){
      points.ids <- sample(points.ids.not.done, 10)
    } else {
      points.ids <- points.ids.not.done
    }
    cat('Assigning', length(points.ids), 'points to panoids.',
        length(points.ids.not.done), 'remain \n')
    l.panoids <- lapply(points.ids, function(x) streetview.metadata(data.table::as.data.table(road.points@data[which(road.points@data$ID == x),]),
                                                                    api_key=api.key,
                                                                    save.location = road.points.panoid.location))
    
    points.files <- list.files(road.points.panoid.location)
    points.ids.done <- sapply(points.files, function(x) stringr::str_extract(x,
                                                                             stringr::regex('(?<=id\\:).+(?=\\;)', perl=TRUE)))
    points.ids.not.done <- points.ids.full[!(points.ids.full %in% points.ids.done)]
  }
  
  # Combine all ponoids into a data.table
  files.to.load <- points.files <- list.files(road.points.panoid.location, full.names = TRUE)
  dt.file <- readRDS(files.to.load[1])
  for(iLoad in 2:length(files.to.load)){
    f.to.load <- files.to.load[iLoad]
    f <- readRDS(f.to.load)
    l.file <- list(dt.file, f)
    dt.file <- rbindlFist(l.file, use.names=TRUE, fill=TRUE)
  }
  dt_pano_ids <- dt.file
  # Unique record for each pano Id and retain order from roads.point
  dt_pano_ids$roads.point.id <- as.integer(dt_pano_ids$roads.point.id)
  setkey(dt_pano_ids, roads.point.id)
  # Find unique panoids
  dt_pano_ids[, pano_id_order := .GRP, by = pano_id]
  dt_pano_ids <- unique(dt_pano_ids[, .(date, location.lat, location.lng, pano_id, pano_id_order, status)])
  dt_pano_ids$location.lng <- as.numeric(dt_pano_ids$location.lng)
  dt_pano_ids$location.lat <- as.numeric(dt_pano_ids$location.lat)
  setkey(dt_pano_ids, pano_id_order)
  # Get bearings for pictures (ahead and behind) (Simply aim at panoids in front and behind.)
  dt_pano_ids$lat.lead <- data.table::shift(dt_pano_ids$location.lat, 1, type='lead')
  dt_pano_ids$lng.lead <- data.table::shift(dt_pano_ids$location.lng, 1, type='lead')
  dt_pano_ids$lat.lag <- data.table::shift(dt_pano_ids$location.lat, 1, type='lag')
  dt_pano_ids$lng.lag <- data.table::shift(dt_pano_ids$location.lng, 1, type='lag')
  dt_pano_ids <- na.omit(dt_pano_ids)
  dt_pano_ids$bearings.lead <- sapply(1:nrow(dt_pano_ids), function(x) geosphere::bearing(c(dt_pano_ids[x,location.lng], dt_pano_ids[x,location.lat]),
                                                                          c(dt_pano_ids[x,lng.lead], dt_pano_ids[x,lat.lead])))
  dt_pano_ids$bearings.lag <- sapply(1:nrow(dt_pano_ids), function(x) geosphere::bearing(c(dt_pano_ids[x,location.lng], dt_pano_ids[x,location.lat]),
                                                                          c(dt_pano_ids[x,lng.lag], dt_pano_ids[x,lat.lag])))
  saveRDS(dt_pano_ids, file=panoids.location)
} else {
  dt_pano_ids <- readRDS(panoids.location)
}
```

* A bit of coding, processing, and downloading streetview metadata to aim the cameras, set the zoom levels.
* There are `r prettyNum(nrow(dt_pano_ids))` camera locations with a rear and forward camerabearing set for each.
* Here are the first 6.

```{r print.table, eval=TRUE, echo=FALSE, cache=TRUE}
out.table <- head(dt_pano_ids[,.(snap_date=date, pano_id, order=pano_id_order, lat=location.lat, lng=location.lng, bear.lead=bearings.lead, bear.lag=bearings.lag)])
kable(out.table, 'latex') %>%
  kable_styling(font_size=6)
```

# Take Google Streetview snapshots

```{r snapshot, eval=TRUE, echo=FALSE, cache=TRUE}
pano_samples <- sample(nrow(dt_pano_ids),size = nrow(dt_pano_ids), replace = FALSE)

for (iSample in pano_samples){
  # if((iSample %% 100)==0) cat(iSample, '\\n')
  dt_pano_id <- dt_pano_ids[iSample,]
  
  fDest.lead <- paste0(snapshots.location, 'pano_id:', dt_pano_id$pano_id, '_lead.jpg')
  fDest.lag <- paste0(snapshots.location, 'pano_id:', dt_pano_id$pano_id, '_lag.jpg')
  g.lat <- dt_pano_id$location.lat
  g.lng <- dt_pano_id$location.lng
  g.bearings <- unlist(dt_pano_id[, .(bearings.lead, bearings.lag)])
  
  api <- list()
  api$head <- 'https://maps.googleapis.com/maps/api/streetview?size=600x400' 
  api$location <- paste0('&location=', g.lat, ',', g.lng)
  api$fov <- paste0('&fov=', 30)
  api$heading <- paste0('&heading=', g.bearings[1])
  api$pitch <- paste0('&pitch=', 0)
  api$api.key <- paste0('&key=',api.key)
  api.url <- paste0(unlist(api), collapse = '')
  if (!(file.exists(fDest.lead))){
    streetShot <- download.file(api.url, fDest.lead, quiet=TRUE) 
  }
  api$heading <- paste0('&heading=', g.bearings[2])
  api.url <- paste0(unlist(api), collapse = '')
  if (!(file.exists(fDest.lag))){
    streetShot <- download.file(api.url, fDest.lag, quiet = TRUE)
  }
}
snap.list <- list.files(path = snapshots.location, full.names = TRUE)
snap.samp <- sample(snap.list, 10)
```

* We then feed the location and variable parameters to the google streetview api and download all `r prettyNum(2*nrow(dt_pano_ids))` pictures.

```{r, echo=FALSE, eval=TRUE, cache=TRUE}
knitr::include_graphics(xtable::sanitize(paste0(path.expand(pkg.data.location), 'raw/training_data/guardrail/22guardrail.jpg'), type='latex'))
```

#

```{r, echo=FALSE, eval=TRUE}
knitr::include_graphics(xtable::sanitize(paste0(path.expand(pkg.data.location), 'raw/training_data/signals/09signals.jpg'), type='latex'))
```

# Human Classification of streetview photos
```{r clean_names,eval=TRUE, echo=FALSE, cache=TRUE}
#Traning data must have unique filenames.
training_dirs <- c('guardrail', 'rumble', 'signals')
for (training_dir in training_dirs){
  filez <- list.files(paste0('~/Dropbox/pkg.data/alabama_roads/raw/training_data/', training_dir), full.names = TRUE)
  filez <- filez[stringr::str_detect(filez, stringr::regex('.+(?=[0-9]{2,2}\\.jpg)', perl=TRUE))]
  sapply(filez,FUN=function(eachPath){
    file.rename(from=eachPath,to=stringr::str_replace(eachPath, pattern = '\\.jpg', replacement = paste0(training_dir, '.jpg')))
  })
}
```
* To start the classification we must teach the machine to recognize different features.
* Human selects 30-50 photos for each class.
  1. 30 photos from streetview downloads with guardrails
  2. 30 photos from streetview downloads with traffic lights
  3. 30 photos from streetview downloads with rumble strips 
* Each set of photos is put into its own folder (guardrail, etc.)

# Training machine to recognize classes
* We use transfer learning on google's Inception v3 algorithm for this demonstrations
* Inception v3 is a deep learning machine that takes a long, long, time to build a model
* We retrain the last step to recognize the guardrails, traffic lights, rumble strips, etc.

Best analogy: Inception v3 is a toddler. We do not want to waste time and resources raising a toddler.
We just want to teach a toddler to recognize a set of new objects such as a spoon. Show the toddler 30 spoons and they gain the ability to classify spoon objects.


# Training Diagnostics
```{r, eval=FALSE, echo=FALSE}
cd ~/tensorflow
# First compile inception for retraining (estimated time ~25 minutes on laptop)
# Do not need to run this more than once on the box.
bazel build --config opt tensorflow/examples/image_retraining:retrain

# retrain/transfer learning on roads data (~5-10 minutes)
# need to rerun this if want to retrain.
bazel-bin/tensorflow/examples/image_retraining/retrain --image_dir /home/ebjohnson5/Dropbox/pkg.data/alabama_roads/raw/training_data
# To visualize training output in tensorboard
tensorboard --logdir /tmp/retrain_logs
```

* Training diagnostics through tensorboard.
```{r, echo=FALSE, eval=TRUE, out.width = "300px",}
knitr::include_graphics(xtable::sanitize('Images/tensorTraining.png', type='latex'))
```

# Classify all photos

```{r photo.list, echo=FALSE, cache=TRUE}
project_location <- '~/Dropbox/pkg.data/alabama_roads/'
classify.name <- 'roads_example1'
f.dir <- paste0(project_location, 'raw/snapshots/')
f.list.path <- list.files(f.dir, full.names = TRUE)
f.list <- list.files(f.dir)
```

* The next step is to classify how each of the `r length(f.list)` photos using the trained model.
* The overall probability of all classes sum to 1 for each photo. 

$$ \sum_{classes} Pr(class_i) = 1$$

```{r Classification, eval=TRUE, echo=FALSE, cache=TRUE}
# This is the main bottleneck in the process right now.
# This can be parallelized in python using tensorflow/serving [https://github.com/tensorflow/serving](tensorflow serving). For now, this works but takes awhile.
tf_classify <- tf_classification()
```

# Classification: Rumble Strips only

```{r, echo=FALSE, eval=TRUE, out.width = "200px",fig.align='center'}
f.name <- 'pano_id:zZ39CumiCqaxS9-1n6erDg_lead.jpg'
f.path <- paste0(path.expand('~/Dropbox/pkg.data/alabama_roads/raw/snapshots/'), f.name)
knitr::include_graphics(f.path)
```

```{r, echo=FALSE, eval=TRUE}
out.table <- tf_classify[fName==f.name, .(fName, category, score)]
kable(out.table, 'latex') %>%
  kable_styling(font_size=6)
```

# Classification: Traffic Signal only

```{r, echo=FALSE, eval=TRUE, out.width = "200px",fig.align='center'}
f.name <- 'pano_id:-km7JQTMeLsO9k7uF_cnsg_lag.jpg'
f.path <- paste0(path.expand('~/Dropbox/pkg.data/alabama_roads/raw/snapshots/'), f.name)
knitr::include_graphics(f.path)
```

```{r, echo=FALSE, eval=TRUE}
out.table <- tf_classify[fName==f.name, .(fName, category, score)]
kable(out.table, 'latex') %>%
  kable_styling(font_size=6)
```

# Classification: Rumble Strips and Guardrail

```{r, echo=FALSE, eval=TRUE, out.width = "200px",fig.align='center'}
f.name <- 'pano_id:zZq1fNpgOGCGAWylAjnidw_lag.jpg'
f.path <- paste0(path.expand('~/Dropbox/pkg.data/alabama_roads/raw/snapshots/'), f.name)
knitr::include_graphics(f.path)
```

```{r, echo=FALSE, eval=TRUE}
out.table <- tf_classify[fName==f.name, .(fName, category, score)]
kable(out.table, 'latex') %>%
  kable_styling(font_size=6)
```

```{r, eval=FALSE, echo=FALSE}
# Guardrail example
#bazel-bin/tensorflow/examples/image_retraining/label_image --graph=/tmp/output_graph.pb --labels=/tmp/output_labels.txt --output_layer=final_result:0 --image=$HOME/Dropbox/pkg.data/alabama_roads/raw/snapshots/pano_id:r2NkANifYabJWFCEi2CXUQ_lag.jpg

#bazel-bin/tensorflow/examples/image_retraining/label_image --graph=/tmp/output_graph.pb --labels=/tmp/output_labels.txt --output_layer=final_result:0 --image=$HOME/Dropbox/pkg.data/alabama_roads/raw/snapshots/pano_id:r0ExJrw5e5qPwp7qcTU9rg_lag.jpg

# Need to add to training
#bazel-bin/tensorflow/examples/image_retraining/label_image --graph=/tmp/output_graph.pb --labels=/tmp/output_labels.txt --output_layer=final_result:0 --image=$HOME/Dropbox/pkg.data/alabama_roads/raw/snapshots/pano_id:r0fW5VldKtkNNetToCeLGQ_lag.jpg

# Traffic signal
#bazel-bin/tensorflow/examples/image_retraining/label_image --graph=/tmp/output_graph.pb --labels=/tmp/output_labels.txt --output_layer=final_result:0 --image=$HOME/Dropbox/pkg.data/alabama_roads/raw/snapshots/pano_id:QZkFVmowjPq0ztNwXXOhdg_lead.jpg

# Null 
#bazel-bin/tensorflow/examples/image_retraining/label_image --graph=/tmp/output_graph.pb --labels=/tmp/output_labels.txt --output_layer=final_result:0 --image=$HOME/Dropbox/pkg.data/alabama_roads/raw/snapshots/pano_id:QZkFVmowjPq0ztNwXXOhdg_lag.jpg
```

# Results:

Classified images from this example project can be used in a variety of ways:

1. Highway infrastructure inventory accounting.
2. Interactive mapping.