brandmaier-manuscript.Rmd

---
title             : "bnn: Recurrent Neural Networks with R"
shorttitle        : "Banana Neural Networks"

author: 
  - name          : "Andreas M. Brandmaier"
    affiliation   : "1"
    corresponding : yes    # Define only one corresponding author
    address       : "Postal address"
    email         : "brandmaier@mpib-berlin.mpg.de"
  - name          : "Ernst-August Doelle"
    affiliation   : "1,2"

affiliation:
  - id            : "1"
    institution   : "Wilhelm-Wundt-University"
  - id            : "2"
    institution   : "Konstanz Business School"

authornote: |
  Add complete departmental affiliations for each author here. Each new line herein must be indented, like this line.

  Enter author note here.

abstract: |
  One or two sentences providing a **basic introduction** to the field,  comprehensible to a scientist in any discipline.
  
  Two to three sentences of **more detailed background**, comprehensible  to scientists in related disciplines.
  
  One sentence clearly stating the **general problem** being addressed by  this particular study.
  
  One sentence summarizing the main result (with the words "**here we show**" or their equivalent).
  
  Two or three sentences explaining what the **main result** reveals in direct comparison to what was thought to be the case previously, or how the  main result adds to previous knowledge.
  
  One or two sentences to put the results into a more **general context**.
  
  Two or three sentences to provide a **broader perspective**, readily comprehensible to a scientist in any discipline.
  
  <!-- https://tinyurl.com/ybremelq -->
  
keywords          : "keywords"
wordcount         : "X"

bibliography      : ["r-references.bib"]

floatsintext      : no
figurelist        : no
tablelist         : no
footnotelist      : no
linenumbers       : yes
mask              : no
draft             : no

documentclass     : "apa6"
classoption       : "man"
output            : papaja::apa6_pdf
---

```{r setup, include = FALSE}
library("papaja")
library(bnn)
```

```{r analysis-preferences}
# Seed for random number generation
set.seed(42)
knitr::opts_chunk$set(cache.extra = knitr::rand_seed)
```


# Introduction

Neural networks are biologically-inspired, general-purpose computational methods that serve various purposes in psychological research. First, neural networks can be used as black-box models for non-linear regression and classification problems. `bnn` is written in C++ to enable fast computation across all platforms. The R package contains wrappers that were created using SWIG and additional manual wrapper design.

Neural networks 

# Methods

First, we install the library from github and attach the package to the current workspace.

```{r eval=FALSE}
devtools::install_github("brandmaier/bnn")
library(bnn)
```

There are seven essential elements in `bnn`: Nodes, ensembles, networks, trainers, sequences, sequences sets, and error functions.

### Creating networks

Some standard architectures can be created from simple network constructors, such as a regular feedforward network for non-sequential tasks (here with 3 input nodes, 10 hidden nodes, and 4 output nodes):

```{r}
FeedForwardNetwork(3, 10, 4)
```

Or, a recurrent neural network with the same layout. The recurrent network, by default, has tanh activations at the hidden layer and linear output activations.

```{r}
RecurrentNetwork(3, 10, 4)
```

There are also constructors for more specialized network architectures, such as the LSTM architecture:

```{r}
LSTMNetwork()
```

`bnn` provides a factory class\footnote{Factories are classes that are never instantiated but only serve to create objects.} called `NetworkFactory` that provides methods to generate a variety of standard architectures
```{r}

```

### Creating customized networks

Customized network architectures can be created by creating and arranging ensembles into a network. Ensembles are sets of nodes (often referred to as layers) even though ensembles can also abstract complex cell types that are created from sets of cells (which in turn could be ensembles). 

```{r}
input_layer <- FeedforwardEnsemble(TANH_NODE, 2)
hidden_layer <- FeedforwardEnsemble(TANH_NODE, 10)
output_layer <- FeedforwardEnsemble(LINEAR_NODE, 1)
```

The layers can be arranged into a network by using the concatenation operator provided by `bnn`:
```{r}
network <- Network() %>% input_layer %>% hidden_layer %>% output_layer
```

TODO: Washout_time (Network)

## Trainer

In `bnnlib`, various algorithms to fit neural networks are available including backpropagation, stochastic gradient descent, adaptive (ADAM), Resilient backpropagation (RProp), Improved resilient propagation (IRProp), root-mean-squared error propagation (RMSProp). The training algorithms can be instantiated by attaching them to an existing network.

```{r}
trainer <- BackpropTrainer()
network %>% trainer
```

There are SWIG wrapper functions to change the default behavior of the trainers. For those using a learning rate, it can be set via

```{r}
Trainer_learning_rate_set(0.0001)
```

The traineing algorithms each have their specific parameters, such as momentum in Backpropagation, which can be set as

```{r}
BackpropTrainer_momentum_set(0.01)
```

Or, the $\eta_{+}$ and $\eta_{-}$ values in improved resilient propagation:

```{r}
ImprovedRPropTrainer_eta_plus_set(0.1)
ImprovedRPropTrainer_eta_minus_set(0.1)
```

To switch between batch learning and stochastic gradient descent, one can set the 'batch learning' option. If it is `TRUE`, the gradients for all sequences are computed and then a single weight change is performed. If `FALSE`, the 

```{r}
Trainer_batch_learning_set(TRUE)
```

Last, all training algorithms can be run with a weight decay parameter that is always applied once the gradients are computed. TODO: Possibly wrong implementation?

```{r}
Trainer_weight_decay_enabled_set(TRUE)
Trainer_weight_decay_set(0.01)
```


Furthermore, a `Trainer` can have several callbacks. Callbacks are methods that are called after a given number of epochs and perform a specified action, such as saving the network, printing informative output.

Last. a `Trainer` can also have one or more stopping criteria. By default, no stopping criterion is given and training always proceeds until the specified number of epochs is trained. This may result in over-fitting and it is common practice to implement some form of early stopping rule. Stopping rules include the `ConvergenceCriterion` which takes a single number as argument. If the absolute difference of the error function between two epochs is equal or smaller than that number, training is stopped.  
or testing whether training should be aborted prematurely (for example, because the error on a validation set starts to increase). Here are some examples of how to implement callbacks and stopping rules:

```{r}
Trainer_add_callback(ProgressDot())
Trainer_add_abort_criterion(ConvergenceCriterion(0.00001))
```

## Error Functions

Error functions are functions that determine the error of the output nodes in a neural network as a divergence function of their activations and the target values. They also provide derivatives of the error necessary for error backpropagation. By default, the squared error loss is used, which is appropriate for regression problems with continuous output variables (typically associated with a linear activation function in the output layer). This error function is instantiated using the constructor `SquaredErrorFunction()`. Other error functions are the `MinkowskiErrorFunction()` implementing an absolute error function (also known as Manhattan error function). For classification, the appropriate error functions are `CrossEntropyErrorFunction()` which implements cross-entropy error that is appropriate for 0-1-coded classification tasks and `WinnerTakesAllErrorFunction()` that is appropriate for tasks with one-hot coding where the output layer is representing a discrete probability function. The `CombinedErrorFunction()` is a meta errorfunction that takes two error functions and combines them by addition.

To evaluate the error of a given Network `network` for a SequenceSet called `sequenceset`, one may simply call 

```{r}
Network_evaluate_error(network,sequenceset)
```

 which evaluates the network's error function on the
or pass a specified error function to the network

```{r}
Network_evaluate_error(network,sequenceset, errorfunction)
```

If sequences are marked with training, validation, and/or test set markers to test the generalization performance within sequences, one can separately evaluate the error estimates on the training and validation parts, either with the network's default error function or with a custom error function:

```{r}
Network_evaluate_training_error(network,sequenceset)
Network_evaluate_validation_error(network,sequenceset)
Network_evaluate_test_error(network,sequenceset)

Network_evaluate_training_error(network,sequenceset, errorfunction)
Network_evaluate_validation_error(network,sequenceset, errorfunction)
Network_evaluate_test_error(network,sequenceset, errorfunction)
```

## Data formatting

`bnnlib` was designed for investigating neural networks for solving sequential task, thus, the most basic data structure is a `Sequence`. A `Sequence` stores a sequence of multivariate inputs and multivariate targets. Multiple sequences can be arranged in a `SequenceSet`, which stores one or more sequences and is the basic unit for estimating parameters in a network.

## Plotting facilities

The package has a particular focus on plotting network activations over time. In general, networks are regarded as black boxes that *somehow* perform a task by specializing their generic architecture to reproducing a given input-output mapping. In small-scale networks, plotting the activation over time is one way of introspection to a network. Particularly with specialized cell types, such as LSTM cells, we can learn something about the task and how a given network architecture goes about solving it.

`bnn` has some basic plotting facilities for plotting sequences, that is input and target activations over time.

```{r eval=FALSE}
plotTrainingerror(trainer)
```

## Demonstrations

### Frequencies Task

### Grammar Task

### Mean Spike Distance Task

### Comparing Training Algorithms


### Replicability

To replicate experiments with neural networks, it is imperative to document all steps from loading and preprocessing data to all choices of parameters and algorithms along the way.

```{r}
set.seed(234)
randomize_seed(32435)
```


# Discussion

Do we really need yet another neural networks library? Sure.

\newpage

# References
```{r create_r-references}
r_refs(file = "r-references.bib")
```

\begingroup
\setlength{\parindent}{-0.5in}
\setlength{\leftskip}{0.5in}

<div id = "refs"></div>
\endgroup