-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathbrandmaier-manuscript.Rmd
263 lines (175 loc) · 10.6 KB
/
brandmaier-manuscript.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
---
title : "bnn: Recurrent Neural Networks with R"
shorttitle : "Banana Neural Networks"
author:
- name : "Andreas M. Brandmaier"
affiliation : "1"
corresponding : yes # Define only one corresponding author
address : "Postal address"
email : "brandmaier@mpib-berlin.mpg.de"
- name : "Ernst-August Doelle"
affiliation : "1,2"
affiliation:
- id : "1"
institution : "Wilhelm-Wundt-University"
- id : "2"
institution : "Konstanz Business School"
authornote: |
Add complete departmental affiliations for each author here. Each new line herein must be indented, like this line.
Enter author note here.
abstract: |
One or two sentences providing a **basic introduction** to the field, comprehensible to a scientist in any discipline.
Two to three sentences of **more detailed background**, comprehensible to scientists in related disciplines.
One sentence clearly stating the **general problem** being addressed by this particular study.
One sentence summarizing the main result (with the words "**here we show**" or their equivalent).
Two or three sentences explaining what the **main result** reveals in direct comparison to what was thought to be the case previously, or how the main result adds to previous knowledge.
One or two sentences to put the results into a more **general context**.
Two or three sentences to provide a **broader perspective**, readily comprehensible to a scientist in any discipline.
<!-- https://tinyurl.com/ybremelq -->
keywords : "keywords"
wordcount : "X"
bibliography : ["r-references.bib"]
floatsintext : no
figurelist : no
tablelist : no
footnotelist : no
linenumbers : yes
mask : no
draft : no
documentclass : "apa6"
classoption : "man"
output : papaja::apa6_pdf
---
```{r setup, include = FALSE}
library("papaja")
library(bnn)
```
```{r analysis-preferences}
# Seed for random number generation
set.seed(42)
knitr::opts_chunk$set(cache.extra = knitr::rand_seed)
```
# Introduction
Neural networks are biologically-inspired, general-purpose computational methods that serve various purposes in psychological research. First, neural networks can be used as black-box models for non-linear regression and classification problems. `bnn` is written in C++ to enable fast computation across all platforms. The R package contains wrappers that were created using SWIG and additional manual wrapper design.
Neural networks
# Methods
First, we install the library from github and attach the package to the current workspace.
```{r eval=FALSE}
devtools::install_github("brandmaier/bnn")
library(bnn)
```
There are seven essential elements in `bnn`: Nodes, ensembles, networks, trainers, sequences, sequences sets, and error functions.
### Creating networks
Some standard architectures can be created from simple network constructors, such as a regular feedforward network for non-sequential tasks (here with 3 input nodes, 10 hidden nodes, and 4 output nodes):
```{r}
FeedForwardNetwork(3, 10, 4)
```
Or, a recurrent neural network with the same layout. The recurrent network, by default, has tanh activations at the hidden layer and linear output activations.
```{r}
RecurrentNetwork(3, 10, 4)
```
There are also constructors for more specialized network architectures, such as the LSTM architecture:
```{r}
LSTMNetwork()
```
`bnn` provides a factory class\footnote{Factories are classes that are never instantiated but only serve to create objects.} called `NetworkFactory` that provides methods to generate a variety of standard architectures
```{r}
```
### Creating customized networks
Customized network architectures can be created by creating and arranging ensembles into a network. Ensembles are sets of nodes (often referred to as layers) even though ensembles can also abstract complex cell types that are created from sets of cells (which in turn could be ensembles).
```{r}
input_layer <- FeedforwardEnsemble(TANH_NODE, 2)
hidden_layer <- FeedforwardEnsemble(TANH_NODE, 10)
output_layer <- FeedforwardEnsemble(LINEAR_NODE, 1)
```
The layers can be arranged into a network by using the concatenation operator provided by `bnn`:
```{r}
network <- Network() %>% input_layer %>% hidden_layer %>% output_layer
```
TODO: Washout_time (Network)
## Trainer
In `bnnlib`, various algorithms to fit neural networks are available including backpropagation, stochastic gradient descent, adaptive (ADAM), Resilient backpropagation (RProp), Improved resilient propagation (IRProp), root-mean-squared error propagation (RMSProp). The training algorithms can be instantiated by attaching them to an existing network.
```{r}
trainer <- BackpropTrainer()
network %>% trainer
```
There are SWIG wrapper functions to change the default behavior of the trainers. For those using a learning rate, it can be set via
```{r}
Trainer_learning_rate_set(0.0001)
```
The traineing algorithms each have their specific parameters, such as momentum in Backpropagation, which can be set as
```{r}
BackpropTrainer_momentum_set(0.01)
```
Or, the $\eta_{+}$ and $\eta_{-}$ values in improved resilient propagation:
```{r}
ImprovedRPropTrainer_eta_plus_set(0.1)
ImprovedRPropTrainer_eta_minus_set(0.1)
```
To switch between batch learning and stochastic gradient descent, one can set the 'batch learning' option. If it is `TRUE`, the gradients for all sequences are computed and then a single weight change is performed. If `FALSE`, the
```{r}
Trainer_batch_learning_set(TRUE)
```
Last, all training algorithms can be run with a weight decay parameter that is always applied once the gradients are computed. TODO: Possibly wrong implementation?
```{r}
Trainer_weight_decay_enabled_set(TRUE)
Trainer_weight_decay_set(0.01)
```
Furthermore, a `Trainer` can have several callbacks. Callbacks are methods that are called after a given number of epochs and perform a specified action, such as saving the network, printing informative output.
Last. a `Trainer` can also have one or more stopping criteria. By default, no stopping criterion is given and training always proceeds until the specified number of epochs is trained. This may result in over-fitting and it is common practice to implement some form of early stopping rule. Stopping rules include the `ConvergenceCriterion` which takes a single number as argument. If the absolute difference of the error function between two epochs is equal or smaller than that number, training is stopped.
or testing whether training should be aborted prematurely (for example, because the error on a validation set starts to increase). Here are some examples of how to implement callbacks and stopping rules:
```{r}
Trainer_add_callback(ProgressDot())
Trainer_add_abort_criterion(ConvergenceCriterion(0.00001))
```
## Error Functions
Error functions are functions that determine the error of the output nodes in a neural network as a divergence function of their activations and the target values. They also provide derivatives of the error necessary for error backpropagation. By default, the squared error loss is used, which is appropriate for regression problems with continuous output variables (typically associated with a linear activation function in the output layer). This error function is instantiated using the constructor `SquaredErrorFunction()`. Other error functions are the `MinkowskiErrorFunction()` implementing an absolute error function (also known as Manhattan error function). For classification, the appropriate error functions are `CrossEntropyErrorFunction()` which implements cross-entropy error that is appropriate for 0-1-coded classification tasks and `WinnerTakesAllErrorFunction()` that is appropriate for tasks with one-hot coding where the output layer is representing a discrete probability function. The `CombinedErrorFunction()` is a meta errorfunction that takes two error functions and combines them by addition.
To evaluate the error of a given Network `network` for a SequenceSet called `sequenceset`, one may simply call
```{r}
Network_evaluate_error(network,sequenceset)
```
which evaluates the network's error function on the
or pass a specified error function to the network
```{r}
Network_evaluate_error(network,sequenceset, errorfunction)
```
If sequences are marked with training, validation, and/or test set markers to test the generalization performance within sequences, one can separately evaluate the error estimates on the training and validation parts, either with the network's default error function or with a custom error function:
```{r}
Network_evaluate_training_error(network,sequenceset)
Network_evaluate_validation_error(network,sequenceset)
Network_evaluate_test_error(network,sequenceset)
Network_evaluate_training_error(network,sequenceset, errorfunction)
Network_evaluate_validation_error(network,sequenceset, errorfunction)
Network_evaluate_test_error(network,sequenceset, errorfunction)
```
## Data formatting
`bnnlib` was designed for investigating neural networks for solving sequential task, thus, the most basic data structure is a `Sequence`. A `Sequence` stores a sequence of multivariate inputs and multivariate targets. Multiple sequences can be arranged in a `SequenceSet`, which stores one or more sequences and is the basic unit for estimating parameters in a network.
## Plotting facilities
The package has a particular focus on plotting network activations over time. In general, networks are regarded as black boxes that *somehow* perform a task by specializing their generic architecture to reproducing a given input-output mapping. In small-scale networks, plotting the activation over time is one way of introspection to a network. Particularly with specialized cell types, such as LSTM cells, we can learn something about the task and how a given network architecture goes about solving it.
`bnn` has some basic plotting facilities for plotting sequences, that is input and target activations over time.
```{r eval=FALSE}
plotTrainingerror(trainer)
```
## Demonstrations
### Frequencies Task
### Grammar Task
### Mean Spike Distance Task
### Comparing Training Algorithms
### Replicability
To replicate experiments with neural networks, it is imperative to document all steps from loading and preprocessing data to all choices of parameters and algorithms along the way.
```{r}
set.seed(234)
randomize_seed(32435)
```
# Discussion
Do we really need yet another neural networks library? Sure.
\newpage
# References
```{r create_r-references}
r_refs(file = "r-references.bib")
```
\begingroup
\setlength{\parindent}{-0.5in}
\setlength{\leftskip}{0.5in}
<div id = "refs"></div>
\endgroup