diff --git a/.Rbuildignore b/.Rbuildignore index e8db4ca60..00df434de 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -5,6 +5,7 @@ ^.*\.hdf5$ ^README.R?md$ ^docs$ +^website$ ^pkgdown$ ^dev$ ^runs$ diff --git a/website/LICENSE.html b/website/LICENSE.html new file mode 100644 index 000000000..2300cf017 --- /dev/null +++ b/website/LICENSE.html @@ -0,0 +1,152 @@ + + + +
+ + + + +YEAR: 2017 +COPYRIGHT HOLDER: RStudio, Inc; Google, Inc; François Chollet; Yuan Tang ++ +
Keras layers are the fundamental building block of keras models. Layers are created using a wide variety of layer_
functions and are typically composed together by stacking calls to them using the pipe %>%
operator. For example:
model <- keras_model_sequential()
+model %>%
+ layer_dense(units = 32, input_shape = c(784)) %>%
+ layer_activation('relu') %>%
+ layer_dense(units = 10) %>%
+ layer_activation('softmax')
A wide variety of layers are available, including:
+ +All layers share the following properties:
+layer$name
— String, must be unique within a model.
layer$input_spec
— List of input specifications. Each entry describes one required input: (ndim, dtype). A layer with n
input tensors must have an input_spec
of length n
.
layer$trainable
— Boolean, whether the layer weights will be updated during training.
layer$uses_learning_phase
– Whether any operation of the layer uses K.in_training_phase()
or K.in_test_phase()
.
layer$input_shape
— Input shape. Provided for convenience, but note that there may be cases in which this attribute is ill-defined (e.g. a shared layer with multiple input shapes), in which case requesting input_shape
will result in an error. Prefer using get_input_shape_at(layer, node_index)
.
layer$output_shape
— Output shape. See above.
layer$inbound_nodes
— List of nodes.
layer$outbound_nodes
— List of nodes.
layer$input
, layer$output
— Input/output tensor(s). Note that if the layer is used more than once (shared layer), this is ill-defined and will result in an error. In such cases, use get_input_at(layer, node_index)
.
layer$input_mask
, layer$output_mask
— Same as above, for masks.
layer$trainable_weights
— List of variables.
layer$non_trainable_weights
— List of variables.
layer$weights
— The concatenation of the lists trainable_weights and non_trainable_weights (in this order).
layer$constraints
— Mapping of weights to constraints.
The following functions are available for interacting with layers:
+
+get_config() from_config()
+ |
+
+ +Layer/Model configuration + + |
+
+get_weights() set_weights()
+ |
+
+ +Layer/Model weights as R arrays + + |
+
+count_params()
+ |
+
+ +Count the total number of scalars composing the weights. + + |
+
+get_input_at() get_output_at() get_input_shape_at() get_output_shape_at() get_input_mask_at() get_output_mask_at()
+ |
+
+ +Retrieve tensors for layers with multiple nodes + + |
+
+reset_states()
+ |
+
+ +Reset the states for a layer + + |
+
There are two types of models available in Keras: sequential models and models created with the functional API.
+Sequential models are created using the keras_model_sequential()
function and are composed of a set of linear layers:
model <- keras_model_sequential()
+model %>%
+ layer_dense(units = 32, input_shape = c(784)) %>%
+ layer_activation('relu') %>%
+ layer_dense(units = 10) %>%
+ layer_activation('softmax')
Note that Keras objects are modified in place which is why it’s not necessary for model
to be assigned back to after the layers are added.
Learn more by reading the Guide to the Sequential Model.
+The functional API enables you to define more complex models, such as multi-output models, directed acyclic graphs, or models with shared layers. To create a model with the functional API compose a set of input and output layers then pass them to the keras_model()
function:
tweet_a <- layer_input(shape = c(140, 256))
+tweet_b <- layer_input(shape = c(140, 256))
+
+# This layer can take as input a matrix and will return a vector of size 64
+shared_lstm <- layer_lstm(units = 64)
+
+# When we reuse the same layer instance multiple times, the weights of the layer are also
+# being reused (it is effectively *the same* layer)
+encoded_a <- tweet_a %>% shared_lstm
+encoded_b <- tweet_b %>% shared_lstm
+
+# We can then concatenate the two vectors and add a logistic regression on top
+predictions <- layer_concatenate(c(encoded_a, encoded_b), axis=-1) %>%
+ layer_dense(units = 1, activation = 'sigmoid')
+
+# We define a trainable model linking the tweet inputs to the predictions
+model <- keras_model(inputs = c(tweet_a, tweet_b), outputs = predictions)
Learn more by reading the Guide to the Functional API.
+All models share the following properties:
+model$layers
— A flattened list of the layers comprising the model graph.
model$inputs
— List of input tensors.
model$outputs
— List of output tensors.
These functions enable you to create, train, evaluate, persist, and generate predictions with models:
+
+keras_model()
+ |
+
+ +Keras Model + + |
+
+keras_model_sequential()
+ |
+
+ +Keras Model composed of a linear stack of layers + + |
+
+compile()
+ |
+
+ +Configure a Keras model for training + + |
+
+fit()
+ |
+
+ +Train a Keras model + + |
+
+evaluate()
+ |
+
+ +Evaluate a Keras model + + |
+
+predict()
+ |
+
+ +Predict Method for Keras Models + + |
+
+summary()
+ |
+
+ +Print a summary of a model + + |
+
+save_model_hdf5() load_model_hdf5()
+ |
+
+ +Save/Load models using HDF5 files + + |
+
+get_layer()
+ |
+
+ +Retrieves a layer based on either its name (unique) or index. + + |
+
+pop_layer()
+ |
+
+ +Remove the last layer in a model + + |
+
+save_model_weights_hdf5() load_model_weights_hdf5()
+ |
+
+ +Save/Load model weights using HDF5 files + + |
+
+get_weights() set_weights()
+ |
+
+ +Layer/Model weights as R arrays + + |
+
+get_config() from_config()
+ |
+
+ +Layer/Model configuration + + |
+
+model_to_json() model_from_json()
+ |
+
+ +Model configuration as JSON + + |
+
+model_to_yaml() model_from_yaml()
+ |
+
+ +Model configuration as YAML + + |
+
Keras Applications are deep learning models that are made available alongside pre-trained weights. These models can be used for prediction, feature extraction, and fine-tuning.
+Weights are downloaded automatically when instantiating a model. They are stored at ~/.keras/models/
.
The following image classification models (with weights trained on ImageNet) are available:
+ +# instantiate the model
+model <- application_resnet50(weights = 'imagenet')
+
+# load the image
+img_path <- "elephant.jpg"
+img <- image_load(img_path, target_size = c(224,224))
+x <- image_to_array(img)
+
+# ensure we have a 4d tensor with single element in the batch dimension,
+# the preprocess the input for prediction using resnet50
+dim(x) <- c(1, dim(x))
+x <- imagenet_preprocess_input(x)
+
+# make predictions then decode and print them
+preds <- model %>% predict(x)
+imagenet_decode_predictions(preds, top = 3)[[1]]
class_name class_description score
+1 n02504013 Indian_elephant 0.90117526
+2 n01871265 tusker 0.08774310
+3 n02504458 African_elephant 0.01046011
+model <- application_vgg16(weights = 'imagenet', include_top = FALSE)
+
+img_path <- "elephant.jpg"
+img <- image_load(img_path, target_size = c(224,224))
+x <- image_to_array(img)
+dim(x) <- c(1, dim(x))
+x <- imagenet_preprocess_input(x)
+
+features <- model %>% predict(x)
base_model <- application_vgg19(weights = 'imagenet')
+model <- keras_model(inputs = base_model$input,
+ outputs = get_layer(base_model, 'block4_pool')$output)
+
+img_path <- "elephant.jpg"
+img <- image_load(img_path, target_size = c(224,224))
+x <- image_to_array(img)
+dim(x) <- c(1, dim(x))
+x <- imagenet_preprocess_input(x)
+
+block4_pool_features <- model %>% predict(x)
# create the base pre-trained model
+base_model <- application_inception_v3(weights = 'imagenet', include_top = FALSE)
+
+# add our custom layers
+predictions <- base_model$output %>%
+ layer_global_average_pooling_2d() %>%
+ layer_dense(units = 1024, activation = 'relu') %>%
+ layer_dense(units = 200, activation = 'softmax')
+
+# this is the model we will train
+model <- keras_model(inputs = base_model$input, outputs = predictions)
+
+# first: train only the top layers (which were randomly initialized)
+# i.e. freeze all convolutional InceptionV3 layers
+for (layer in base_model$layers)
+ layer$trainable <- FALSE
+
+# compile the model (should be done *after* setting layers to non-trainable)
+model %>% compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy')
+
+# train the model on the new data for a few epochs
+model %>% fit_generator(...)
+
+# at this point, the top layers are well trained and we can start fine-tuning
+# convolutional layers from inception V3. We will freeze the bottom N layers
+# and train the remaining top layers.
+
+# let's visualize layer names and layer indices to see how many layers
+# we should freeze:
+layers <- base_model$layers
+for (i in 1:length(layers))
+ cat(i, layers[[i]]$name, "\n")
+
+# we chose to train the top 2 inception blocks, i.e. we will freeze
+# the first 172 layers and unfreeze the rest:
+for (i in 1:172)
+ layers[[i]]$trainable <- FALSE
+for (i in 173:length(layers))
+ layers[[i]]$trainable <- TRUE
+
+# we need to recompile the model for these modifications to take effect
+# we use SGD with a low learning rate
+model %>% compile(
+ optimizer = optimizer_sgd(lr = 0.0001, momentum = 0.9),
+ loss = 'categorical_crossentropy'
+)
+
+# we train our model again (this time fine-tuning the top 2 inception blocks
+# alongside the top Dense layers
+model %>% fit_generator(...)
# this could also be the output a different Keras model or layer
+input_tensor <- layer_input(shape = c(224, 224, 3))
+
+model <- application_inception_V3(input_tensor = input_tensor,
+ weights='imagenet',
+ include_top = TRUE)
The VGG16 model is the basis for the Deep dream Keras example script.
+Keras is a model-level library, providing high-level building blocks for developing deep learning models. It does not handle itself low-level operations such as tensor products, convolutions and so on. Instead, it relies on a specialized, well-optimized tensor manipulation library to do so, serving as the “backend engine” of Keras.
+The R interface to Keras uses TensorFlow™ as it’s default tensor backend engine, however it’s possible to use other backends if desired. At this time, Keras has three backend implementations available:
+TensorFlow is an open-source symbolic tensor manipulation framework developed by Google, Inc.
Theano is an open-source symbolic tensor manipulation framework developed by LISA/MILA Lab at Université de Montréal.
CNTK is an open-source, commercial-grade toolkit for deep learning developed by Microsoft.
Keras uses the TensorFlow backend by default. If you want to switch to Theano set the KERAS_BACKEND environment variable before loading the Keras package as follows:
+Sys.setenv(KERAS_BACKEND = "theano")
+library(keras)
If you want to use the CNTK backend then you should follow the installation instructions for CNTK and then set the KERAS_BACKEND
environment variable before loading the keras R package as follows:
Sys.setenv(KERAS_BACKEND = "cntk")
+library(keras)
If you want to use a backend provided by the keras Python package you typically need only to install the package and the backend, then set the KERAS_BACKEND
environment variable as described above.
If you need to customize things further there are several environment variables that affect the version of Keras used:
+Variable | +Description | +
---|---|
KERAS_IMPLEMENTATION |
+Keras specifies an API that can be implemented by multiple providers. By default, the Keras R package uses the implementation provided by the Keras Python package (“keras”). TensorFlow also provides an integrated implementation of Keras which you can use by specifying “tensorflow” as the implementation. | +
KERAS_BACKEND |
+The “keras” implementation supports the “tensorflow”, “keras”, and “cntk” backends. Note that the “tensorflow” implementation supports only the “tensorflow” backend. | +
KERAS_PYTHON |
+The Keras R package will automatically scan installed versions of Python (and virtual/conda environments) to find the one that includes the selected implementation of Keras. If this scanning doesn’t find the right version or you want to override its behavior, you can set the KERAS_PYTHON environment variable to the location of the Python binary you want to use. |
+
Note that if you want to use TensorFlow as the backend engine you wouldn’t need to set any of these variables, as it will be used automatically by default.
+If you want the Keras modules you write to be compatible with all available backends, you have to write them via the abstract Keras backend API. You can obtain a reference to the TensorFlow backend by calling the backend()
function:
library(keras)
+K <- backend()
The code below instantiates an input placeholder. It’s equivalent to tf$placeholder()
:
input <- K$placeholder(shape = list(2L, 4L, 5L))
+# also works:
+input <- K$placeholder(shape = list(NULL, 4L, 5L))
+# also works:
+input <- K$placeholder(ndim = 3L)
The code below instantiates a shared variable. It’s equivalent to tf$Variable()
:
val <- array(runif(60), dim = c(3L, 4L, 5L))
+var <- K$variable(value = val)
+
+# all-zeros variable:
+var <- K$zeros(shape = list(3L, 4L, 5L))
+# all-ones:
+var <- K$ones(shape = list(3L, 4L, 5L))
Note that the examples above all pass integer values explicitly (e.g. 5L
). This is because unlike the high level R functions in the Keras package the backend APIs are all strongly typed (i.e. float values are not automatically converted to integers).
Name | +Description | +
---|---|
abs | +Element-wise absolute value. | +
all | +Bitwise reduction (logical AND). | +
any | +Bitwise reduction (logical OR). | +
arange | +Creates a 1D tensor containing a sequence of integers. | +
argmax | +Returns the index of the maximum value along an axis. | +
argmin | +Returns the index of the minimum value along an axis. | +
backend | +Publicly accessible method for determining the current backend. | +
batch_dot | +Batchwise dot product. | +
batch_flatten | +Turn a nD tensor into a 2D tensor with same 0th dimension. | +
batch_get_value | +Returns the value of more than one tensor variable. | +
batch_normalization | +Applies batch normalization on x given mean, var, beta and gamma. | +
batch_set_value | +Sets the values of many tensor variables at once. | +
bias_add | +Adds a bias vector to a tensor. | +
binary_crossentropy | +Binary crossentropy between an output tensor and a target tensor. | +
cast | +Casts a tensor to a different dtype and returns it. | +
cast_to_floatx | +Cast a Numpy array to the default Keras float type. | +
categorical_crossentropy | +Categorical crossentropy between an output tensor and a target tensor. | +
clear_session | +Destroys the current TF graph and creates a new one. | +
clip | +Element-wise value clipping. | +
concatenate | +Concatenates a list of tensors alongside the specified axis. | +
constant | +Creates a constant tensor. | +
conv1d | +1D convolution. | +
conv2d | +2D convolution. | +
conv2d_transpose | +2D deconvolution (i.e. | +
conv3d | +3D convolution. | +
cos | +Computes cos of x element-wise. | +
count_params | +Returns the number of scalars in a Keras variable. | +
ctc_batch_cost | +Runs CTC loss algorithm on each batch element. | +
ctc_decode | +Decodes the output of a softmax. | +
ctc_label_dense_to_sparse | +Converts CTC labels from dense to sparse. | +
cumprod | +Cumulative product of the values in a tensor, alongside the specified axis. | +
cumsum | +Cumulative sum of the values in a tensor, alongside the specified axis. | +
dot | +Multiplies 2 tensors (and/or variables) and returns a tensor. | +
dropout | +Sets entries in x to zero at random, while scaling the entire tensor. |
+
dtype | +Returns the dtype of a Keras tensor or variable, as a string. | +
elu | +Exponential linear unit. | +
epsilon | +Returns the value of the fuzz factor used in numeric expressions. | +
equal | +Element-wise equality between two tensors. | +
eval | +Evaluates the value of a variable. | +
exp | +Element-wise exponential. | +
expand_dims | +Adds a 1-sized dimension at index “axis”. | +
eye | +Instantiate an identity matrix and returns it. | +
flatten | +Flatten a tensor. | +
floatx | +Returns the default float type, as a string. | +
foldl | +Reduce elems using fn to combine them from left to right. | +
foldr | +Reduce elems using fn to combine them from right to left. | +
gather | +Retrieves the elements of indices indices in the tensor reference . |
+
get_session | +Returns the TF session to be used by the backend. | +
get_uid | +Associates a string prefix with an integer counter in a TensorFlow graph. | +
get_value | +Returns the value of a variable. | +
gradients | +Returns the gradients of variables w.r.t. loss . |
+
greater | +Element-wise truth value of (x > y). | +
greater_equal | +Element-wise truth value of (x >= y). | +
hard_sigmoid | +Segment-wise linear approximation of sigmoid. | +
identity | +Returns a tensor with the same content as the input tensor. | +
image_data_format | +Returns the default image data format convention. | +
in_test_phase | +Selects x in test phase, and alt otherwise. |
+
in_top_k | +Returns whether the targets are in the top k predictions . |
+
in_train_phase | +Selects x in train phase, and alt otherwise. |
+
int_shape | +Returns the shape tensor or variable as a list of int or NULL entries. | +
is_sparse | +Returns whether a tensor is a sparse tensor. | +
l2_normalize | +Normalizes a tensor wrt the L2 norm alongside the specified axis. | +
learning_phase | +Returns the learning phase flag. | +
less | +Element-wise truth value of (x < y). | +
less_equal | +Element-wise truth value of (x <= y). | +
local_conv1d | +Apply 1D conv with un-shared weights. | +
local_conv2d | +Apply 2D conv with un-shared weights. | +
log | +Element-wise log. | +
logsumexp | +Computes log(sum(exp(elements across dimensions of a tensor))). | +
manual_variable_initialization | +Sets the manual variable initialization flag. | +
map_fn | +Map the function fn over the elements elems and return the outputs. | +
max | +Maximum value in a tensor. | +
maximum | +Element-wise maximum of two tensors. | +
mean | +Mean of a tensor, alongside the specified axis. | +
min | +Minimum value in a tensor. | +
minimum | +Element-wise minimum of two tensors. | +
moving_average_update | +Compute the moving average of a variable. | +
name_scope | +Returns a context manager for use when defining a Python op. | +
ndim | +Returns the number of axes in a tensor, as an integer. | +
normalize_batch_in_training | +Computes mean and std for batch then apply batch_normalization on batch. | +
not_equal | +Element-wise inequality between two tensors. | +
one_hot | +Computes the one-hot representation of an integer tensor. | +
ones | +Instantiates an all-ones tensor variable and returns it. | +
ones_like | +Instantiates an all-ones variable of the same shape as another tensor. | +
permute_dimensions | +Permutes axes in a tensor. | +
placeholder | +Instantiates a placeholder tensor and returns it. | +
pool2d | +2D Pooling. | +
pool3d | +3D Pooling. | +
pow | +Element-wise exponentiation. | +
print_tensor | +Prints message and the tensor value when evaluated. |
+
prod | +Multiplies the values in a tensor, alongside the specified axis. | +
py_all | +all(iterable) -> bool | +
py_sum | +sum(sequence[, start]) -> value | +
random_binomial | +Returns a tensor with random binomial distribution of values. | +
random_normal | +Returns a tensor with normal distribution of values. | +
random_normal_variable | +Instantiates a variable with values drawn from a normal distribution. | +
random_uniform | +Returns a tensor with uniform distribution of values. | +
random_uniform_variable | +Instantiates a variable with values drawn from a uniform distribution. | +
relu | +Rectified linear unit. | +
repeat_elements | +Repeats the elements of a tensor along an axis, like np.repeat . |
+
reset_uids | ++ |
reshape | +Reshapes a tensor to the specified shape. | +
resize_images | +Resizes the images contained in a 4D tensor. | +
resize_volumes | +Resizes the volume contained in a 5D tensor. | +
reverse | +Reverse a tensor along the specified axes. | +
rnn | +Iterates over the time dimension of a tensor. | +
round | +Element-wise rounding to the closest integer. | +
separable_conv2d | +2D convolution with separable filters. | +
set_epsilon | +Sets the value of the fuzz factor used in numeric expressions. | +
set_floatx | +Sets the default float type. | +
set_image_data_format | +Sets the value of the image data format convention. | +
set_learning_phase | +Sets the learning phase to a fixed value. | +
set_session | +Sets the global TensorFlow session. | +
set_value | +Sets the value of a variable, from a Numpy array. | +
shape | +Returns the symbolic shape of a tensor or variable. | +
sigmoid | +Element-wise sigmoid. | +
sign | +Element-wise sign. | +
sin | +Computes sin of x element-wise. | +
softmax | +Softmax of a tensor. | +
softplus | +Softplus of a tensor. | +
softsign | +Softsign of a tensor. | +
sparse_categorical_crossentropy | +Categorical crossentropy with integer targets. | +
spatial_2d_padding | +Pads the 2nd and 3rd dimensions of a 4D tensor. | +
spatial_3d_padding | +Pads 5D tensor with zeros along the depth, height, width dimensions. | +
sqrt | +Element-wise square root. | +
square | +Element-wise square. | +
squeeze | +Removes a 1-dimension from the tensor at index “axis”. | +
stack | +Stacks a list of rank R tensors into a rank R+1 tensor. |
+
std | +Standard deviation of a tensor, alongside the specified axis. | +
stop_gradient | +Returns variables but with zero gradient w.r.t. every other variable. |
+
sum | +Sum of the values in a tensor, alongside the specified axis. | +
switch | +Switches between two operations depending on a scalar value. | +
tanh | +Element-wise tanh. | +
temporal_padding | +Pads the middle dimension of a 3D tensor. | +
tile | +Creates a tensor by tiling x by n . |
+
to_dense | +Converts a sparse tensor into a dense tensor and returns it. | +
transpose | +Transposes a tensor and returns it. | +
truncated_normal | +Returns a tensor with truncated random normal distribution of values. | +
update | ++ |
update_add | +Update the value of x by adding increment . |
+
update_sub | +Update the value of x by subtracting decrement . |
+
var | +Variance of a tensor, alongside the specified axis. | +
variable | +Instantiates a variable and returns it. | +
zeros | +Instantiates an all-zeros variable and returns it. | +
zeros_like | +Instantiates an all-zeros variable of the same shape as another tensor. | +
If the existing Keras layers don’t meet your requirements you can create a custom layer. For simple, stateless custom operations, you are probably better off using layer_lambda()
layers. But for any custom operation that has trainable weights, you should implement your own layer.
The example below illustrates the skeleton of a Keras custom layer. The mnist_antirectifier example includes another demonstration of creating a custom layer.
+To create a custom Keras layer, you create an R6 class derived from KerasLayer
. There are three methods to implement (only one of which, call()
, is required for all types of layer):
build(input_shape)
: This is where you will define your weights. Note that if your layer doesn’t define trainable weights then you need not implemented this method.call(x)
: This is where the layer’s logic lives. Unless you want your layer to support masking, you only have to care about the first argument passed to call
: the input tensor.compute_output_shape(input_shape)
: In case your layer modifies the shape of its input, you should specify here the shape transformation logic. This allows Keras to do automatic shape inference. If you don’t modify the shape of the input then you need not implement this method.library(keras)
+
+K <- backend()
+
+CustomLayer <- R6::R6Class("KerasLayer",
+
+ inherit = KerasLayer,
+
+ public = list(
+
+ output_dim = NULL,
+
+ kernel = NULL,
+
+ initialize = function(output_dim) {
+ self$output_dim <- output_dim
+ },
+
+ build = function(input_shape) {
+ self$kernel <- self$add_weight(
+ name = 'kernel',
+ shape = list(input_shape[[2]], self$output_dim),
+ initializer = initializer_random_normal(),
+ trainable = TRUE
+ )
+ },
+
+ call = function(x, mask = NULL) {
+ K$dot(x, self$kernel)
+ },
+
+ compute_output_shape = function(input_shape) {
+ list(input_shape[[1]], self$output_dim)
+ }
+ )
+)
Note that tensor operations are executed using the Keras backend()
. See the Keras Backend article for details on the various functions available from Keras backends.
In order to use the custom layer within a Keras model you also need to create a wrapper function which instantiates the layer using the create_layer()
function. For example:
# define layer wrapper function
+layer_custom <- function(object, output_dim, name = NULL, trainable = TRUE) {
+ create_layer(CustomLayer, object, list(
+ output_dim = as.integer(output_dim),
+ name = name,
+ trainable = trainable
+ ))
+}
+
+# use it in a model
+model <- keras_model_sequential()
+model %>%
+ layer_dense(units = 32, input_shape = c(32,32)) %>%
+ layer_custom(output_dim = 32)
Some important things to note about the layer wrapper function:
+It accepts object
as its first parameter (the object will either be a Keras sequential model or another Keras layer). The object
parameter enables the layer to be composed with other layers using the magrittr pipe (%>%
) operator.
It converts it’s output_dim
to integer using the as.integer()
function. This is done as convenience to the user because Keras variables are strongly typed (you can’t pass a float if an integer is expected). This enables users of the function to write output_dim = 32
rather than output_dim = 32L
.
Some additional parameters not used by the layer (name
and trainable
) are in the function signature. Custom layer functions can include any of the core layer function arguments (input_shape
, batch_input_shape
, batch_size
, dtype
, name
, trainable
, and weights
) and they will be automatically forwarded to the Layer base class.
See the mnist_antirectifier example for another demonstration of creating a custom layer.
+An implementation of sequence to sequence learning for performing addition
+Input: “535+61”
+Output: “596”
Padding is handled by using a repeated sentinel character (space)
+Input may optionally be inverted, shown to increase performance in many tasks in: “Learning to Execute” http://arxiv.org/abs/1410.4615 and “Sequence to Sequence Learning with Neural Networks” http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf Theoretically it introduces shorter term dependencies between source and target.
+Two digits inverted: One layer LSTM (128 HN), 5k training examples = 99% train/test accuracy in 55 epochs
+Three digits inverted: One layer LSTM (128 HN), 50k training examples = 99% train/test accuracy in 100 epochs
+Four digits inverted: One layer LSTM (128 HN), 400k training examples = 99% train/test accuracy in 20 epochs
+Five digits inverted: One layer LSTM (128 HN), 550k training examples = 99% train/test accuracy in 30 epochs
+library(keras)
+library(stringi)
+
+# Function Definitions ----------------------------------------------------
+
+# Creates the char table
+# Just sorts them..
+learn_encoding <- function(chars){
+ sort(chars)
+}
+
+# Encode to a character sequence to a one hot
+# integer representation.
+# > encode("22+22", char_table)
+# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
+# 2 0 0 0 0 1 0 0 0 0 0 0 0
+# 2 0 0 0 0 1 0 0 0 0 0 0 0
+# + 0 1 0 0 0 0 0 0 0 0 0 0
+# 2 0 0 0 0 1 0 0 0 0 0 0 0
+# 2 0 0 0 0 1 0 0 0 0 0 0 0
+encode <- function(char, char_table){
+ strsplit(char, "") %>%
+ unlist() %>%
+ sapply(function(x){
+ as.numeric(x == char_table)
+ }) %>%
+ t()
+}
+
+# Decode the one hot representation/probabilities representation
+# to their character output.
+decode <- function(x, char_table){
+ apply(x,1, function(y){
+ char_table[which.max(y)]
+ }) %>% paste0(collapse = "")
+}
+
+# Returns a list of questions and expected answers.
+generate_data <- function(size, digits, invert = TRUE){
+
+ max_num <- as.integer(paste0(rep(9, digits), collapse = ""))
+
+ # generate integers for both sides of question
+ x <- sample(1:max_num, size = size, replace = TRUE)
+ y <- sample(1:max_num, size = size, replace = TRUE)
+
+ # make left side always samalller then right side
+ left_side <- ifelse(x <= y, x, y)
+ right_side <- ifelse(x >= y, x, y)
+
+ results <- left_side + right_side
+
+ # pad with spaces on the right
+ questions <- paste0(left_side, "+", right_side)
+ questions <- stri_pad(questions, width = 2*digits+1,
+ side = "right", pad = " ")
+ if(invert){
+ questions <- stri_reverse(questions)
+ }
+ # pad with spaces on the left
+ results <- stri_pad(results, width = digits + 1,
+ side = "left", pad = " ")
+
+ list(
+ questions = questions,
+ results = results
+ )
+}
+
+# Parameters --------------------------------------------------------------
+
+# Parameters for the model and dataset.
+TRAINING_SIZE <- 50000
+DIGITS <- 2
+
+# Maximum length of input is 'int + int' (e.g., '345+678'). Maximum length of
+# int is DIGITS.
+MAXLEN <- DIGITS + 1 + DIGITS
+
+# All the numbers, plus sign and space for padding.
+charset <- c(0:9, "+", " ")
+char_table <- learn_encoding(charset)
+
+
+# Data Preparation --------------------------------------------------------
+
+# Generate Data
+
+examples <- generate_data(size = TRAINING_SIZE, digits = DIGITS)
+
+# Vectorization
+
+x <- array(0, dim = c(length(examples$questions), MAXLEN, length(char_table)))
+y <- array(0, dim = c(length(examples$questions), DIGITS + 1, length(char_table)))
+
+for(i in 1:TRAINING_SIZE){
+ x[i,,] <- encode(examples$questions[i], char_table)
+ y[i,,] <- encode(examples$results[i], char_table)
+}
+
+# Shuffle
+
+indices <- sample(1:TRAINING_SIZE, size = TRAINING_SIZE)
+x <- x[indices,,]
+y <- y[indices,,]
+
+
+# Explicitly set apart 10% for validation data that we never train over.
+
+split_at <- trunc(TRAINING_SIZE/10)
+x_val <- x[1:split_at,,]
+y_val <- y[1:split_at,,]
+x_train <- x[(split_at + 1):TRAINING_SIZE,,]
+y_train <- y[(split_at + 1):TRAINING_SIZE,,]
+
+print('Training Data:')
+print(dim(x_train))
+print(dim(y_train))
+
+print('Validation Data:')
+print(dim(x_val))
+print(dim(y_val))
+
+
+# Training ----------------------------------------------------------------
+
+HIDDEN_SIZE <- 128
+BATCH_SIZE <- 128
+LAYERS <- 1
+
+# Initialize sequential model
+model <- keras_model_sequential()
+
+model %>%
+ # "Encode" the input sequence using an RNN, producing an output of HIDDEN_SIZE.
+ # Note: In a situation where your input sequences have a variable length,
+ # use input_shape=(None, num_feature).
+ layer_lstm(HIDDEN_SIZE, input_shape=c(MAXLEN, length(char_table))) %>%
+ # As the decoder RNN's input, repeatedly provide with the last hidden state of
+ # RNN for each time step. Repeat 'DIGITS + 1' times as that's the maximum
+ # length of output, e.g., when DIGITS=3, max output is 999+999=1998.
+ layer_repeat_vector(DIGITS + 1)
+
+# The decoder RNN could be multiple layers stacked or a single layer.
+# By setting return_sequences to True, return not only the last output but
+# all the outputs so far in the form of (num_samples, timesteps,
+# output_dim). This is necessary as TimeDistributed in the below expects
+# the first dimension to be the timesteps.
+for(i in 1:LAYERS)
+ model %>% layer_lstm(HIDDEN_SIZE, return_sequences = TRUE)
+
+model %>%
+ # Apply a dense layer to the every temporal slice of an input. For each of step
+ # of the output sequence, decide which character should be chosen.
+ time_distributed(layer_dense(units = length(char_table))) %>%
+ layer_activation("softmax")
+
+# Compiling the model
+model %>% compile(
+ loss = "categorical_crossentropy",
+ optimizer = "adam",
+ metrics = "accuracy"
+)
+
+# Get the model summary
+summary(model)
+
+# Fitting loop
+model %>% fit(
+ x = x_train,
+ y = y_train,
+ batch_size = BATCH_SIZE,
+ epochs = 70,
+ validation_data = list(x_val, y_val)
+)
+
+# Predict for a new obs
+new_obs <- encode("55+22", char_table) %>%
+ array(dim = c(1,5,12))
+result <- predict(model, new_obs)
+result <- result[1,,]
+decode(result, char_table)
Trains a memory network on the bAbI dataset.
+References:
+Jason Weston, Antoine Bordes, Sumit Chopra, Tomas Mikolov, Alexander M. Rush, “Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks”, http://arxiv.org/abs/1502.05698
Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus, “End-To-End Memory Networks”, http://arxiv.org/abs/1503.08895
Reaches 98.6% accuracy on task ‘single_supporting_fact_10k’ after 120 epochs. Time per epoch: 3s on CPU (core i7).
+library(keras)
+library(readr)
+library(stringr)
+library(purrr)
+library(tibble)
+library(dplyr)
+
+
+# Function definition -----------------------------------------------------
+
+tokenize_words <- function(x){
+ x <- x %>%
+ str_replace_all('([[:punct:]]+)', ' \\1') %>%
+ str_split(' ') %>%
+ unlist()
+ x[x != ""]
+}
+
+parse_stories <- function(lines, only_supporting = FALSE){
+ lines <- lines %>%
+ str_split(" ", n = 2) %>%
+ map_df(~tibble(nid = as.integer(.x[[1]]), line = .x[[2]]))
+
+ lines <- lines %>%
+ mutate(
+ split = map(line, ~str_split(.x, "\t")[[1]]),
+ q = map_chr(split, ~.x[1]),
+ a = map_chr(split, ~.x[2]),
+ supporting = map(split, ~.x[3] %>% str_split(" ") %>% unlist() %>% as.integer()),
+ story_id = c(0, cumsum(nid[-nrow(.)] > nid[-1]))
+ ) %>%
+ select(-split)
+
+ stories <- lines %>%
+ filter(is.na(a)) %>%
+ select(nid_story = nid, story_id, story = q)
+
+ questions <- lines %>%
+ filter(!is.na(a)) %>%
+ select(-line) %>%
+ left_join(stories, by = "story_id") %>%
+ filter(nid_story < nid)
+
+ if(only_supporting){
+ questions <- questions %>%
+ filter(map2_lgl(nid_story, supporting, ~.x %in% .y))
+ }
+
+ questions %>%
+ group_by(story_id, nid, question = q, answer = a) %>%
+ summarise(story = paste(story, collapse = " ")) %>%
+ ungroup() %>%
+ mutate(
+ question = map(question, ~tokenize_words(.x)),
+ story = map(story, ~tokenize_words(.x)),
+ id = row_number()
+ ) %>%
+ select(id, question, answer, story)
+}
+
+vectorize_stories <- function(data, vocab, story_maxlen, query_maxlen){
+
+ questions <- map(data$question, function(x){
+ map_int(x, ~which(.x == vocab))
+ })
+
+ stories <- map(data$story, function(x){
+ map_int(x, ~which(.x == vocab))
+ })
+
+ # "" represents padding
+ answers <- sapply(c("", vocab), function(x){
+ as.integer(x == data$answer)
+ })
+
+
+ list(
+ questions = pad_sequences(questions, maxlen = query_maxlen),
+ stories = pad_sequences(stories, maxlen = story_maxlen),
+ answers = answers
+ )
+}
+
+
+# Parameters --------------------------------------------------------------
+
+challenges <- list(
+ # QA1 with 10,000 samples
+ single_supporting_fact_10k = "%stasks_1-20_v1-2/en-10k/qa1_single-supporting-fact_%s.txt",
+ # QA2 with 10,000 samples
+ two_supporting_facts_10k = "%stasks_1-20_v1-2/en-10k/qa2_two-supporting-facts_%s.txt"
+)
+
+challenge_type <- "single_supporting_fact_10k"
+challenge <- challenges[[challenge_type]]
+max_length <- 999999
+
+# Data Preparation --------------------------------------------------------
+
+# Download data
+path <- get_file(
+ fname = "babi-tasks-v1-2.tar.gz",
+ origin = "https://s3.amazonaws.com/text-datasets/babi_tasks_1-20_v1-2.tar.gz"
+)
+untar(path, exdir = str_replace(path, fixed(".tar.gz"), "/"))
+path <- str_replace(path, fixed(".tar.gz"), "/")
+
+# Reading training and test data
+train <- read_lines(sprintf(challenge, path, "train")) %>%
+ parse_stories() %>%
+ filter(map_int(story, ~length(.x)) <= max_length)
+
+test <- read_lines(sprintf(challenge, path, "test")) %>%
+ parse_stories() %>%
+ filter(map_int(story, ~length(.x)) <= max_length)
+
+# extract the vocabulary
+all_data <- bind_rows(train, test)
+vocab <- c(unlist(all_data$question), all_data$answer,
+ unlist(all_data$story)) %>%
+ unique() %>%
+ sort()
+
+# Reserve 0 for masking via pad_sequences
+vocab_size <- length(vocab) + 1
+story_maxlen <- map_int(all_data$story, ~length(.x)) %>% max()
+query_maxlen <- map_int(all_data$question, ~length(.x)) %>% max()
+
+# vectorized versions of training and test sets
+train_vec <- vectorize_stories(train, vocab, story_maxlen, query_maxlen)
+test_vec <- vectorize_stories(test, vocab, story_maxlen, query_maxlen)
+
+# Defining the model ------------------------------------------------------
+
+# placeholders
+sequence <- layer_input(shape = c(story_maxlen))
+question <- layer_input(shape = c(query_maxlen))
+
+# encoders
+# embed the input sequence into a sequence of vectors
+sequence_encoder_m <- keras_model_sequential()
+sequence_encoder_m %>%
+ layer_embedding(input_dim = vocab_size, output_dim = 64) %>%
+ layer_dropout(rate = 0.3)
+# output: (samples, story_maxlen, embedding_dim)
+
+# embed the input into a sequence of vectors of size query_maxlen
+sequence_encoder_c <- keras_model_sequential()
+sequence_encoder_c %>%
+ layer_embedding(input_dim = vocab_size, output = query_maxlen) %>%
+ layer_dropout(rate = 0.3)
+# output: (samples, story_maxlen, query_maxlen)
+
+# embed the question into a sequence of vectors
+question_encoder <- keras_model_sequential()
+question_encoder %>%
+ layer_embedding(input_dim = vocab_size, output_dim = 64,
+ input_length = query_maxlen) %>%
+ layer_dropout(rate = 0.3)
+# output: (samples, query_maxlen, embedding_dim)
+
+# encode input sequence and questions (which are indices)
+# to sequences of dense vectors
+sequence_encoded_m <- sequence_encoder_m(sequence)
+sequence_encoded_c <- sequence_encoder_c(sequence)
+question_encoded <- question_encoder(question)
+
+# compute a 'match' between the first input vector sequence
+# and the question vector sequence
+# shape: `(samples, story_maxlen, query_maxlen)`
+match <- list(sequence_encoded_m, question_encoded) %>%
+ layer_dot(axes = c(2,2)) %>%
+ layer_activation("softmax")
+
+# add the match matrix with the second input vector sequence
+response <- list(match, sequence_encoded_c) %>%
+ layer_add() %>%
+ layer_permute(c(2,1))
+
+# concatenate the match matrix with the question vector sequence
+answer <- list(response, question_encoded) %>%
+ layer_concatenate() %>%
+ # the original paper uses a matrix multiplication for this reduction step.
+ # we choose to use a RNN instead.
+ layer_lstm(32) %>%
+ # one regularization layer -- more would probably be needed.
+ layer_dropout(rate = 0.3) %>%
+ layer_dense(vocab_size) %>%
+ # we output a probability distribution over the vocabulary
+ layer_activation("softmax")
+
+# build the final model
+model <- keras_model(inputs = list(sequence, question), answer)
+model %>% compile(
+ optimizer = "rmsprop",
+ loss = "categorical_crossentropy",
+ metrics = "accuracy"
+)
+
+
+# Training ----------------------------------------------------------------
+
+model %>% fit(
+ x = list(train_vec$stories, train_vec$questions),
+ y = train_vec$answers,
+ batch_size = 32,
+ epochs = 120,
+ validation_data = list(list(test_vec$stories, test_vec$questions), test_vec$answers)
+)
Trains two recurrent neural networks based upon a story and a question. The resulting merged vector is then queried to answer a range of bAbI tasks.
+The results are comparable to those for an LSTM model provided in Weston et al.: “Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks” http://arxiv.org/abs/1502.05698
+Task Number | +FB LSTM Baseline | +Keras QA | +
---|---|---|
QA1 - Single Supporting Fact | +50 | +100.0 | +
QA2 - Two Supporting Facts | +20 | +50.0 | +
QA3 - Three Supporting Facts | +20 | +20.5 | +
QA4 - Two Arg. Relations | +61 | +62.9 | +
QA5 - Three Arg. Relations | +70 | +61.9 | +
QA6 - yes/No Questions | +48 | +50.7 | +
QA7 - Counting | +49 | +78.9 | +
QA8 - Lists/Sets | +45 | +77.2 | +
QA9 - Simple Negation | +64 | +64.0 | +
QA10 - Indefinite Knowledge | +44 | +47.7 | +
QA11 - Basic Coreference | +72 | +74.9 | +
QA12 - Conjunction | +74 | +76.4 | +
QA13 - Compound Coreference | +94 | +94.4 | +
QA14 - Time Reasoning | +27 | +34.8 | +
QA15 - Basic Deduction | +21 | +32.4 | +
QA16 - Basic Induction | +23 | +50.6 | +
QA17 - Positional Reasoning | +51 | +49.1 | +
QA18 - Size Reasoning | +52 | +90.8 | +
QA19 - Path Finding | +8 | +9.0 | +
QA20 - Agent’s Motivations | +91 | +90.7 | +
For the resources related to the bAbI project, refer to: https://research.facebook.com/researchers/1543934539189348
+Notes:
+50% test accuracy on QA2 in 20 epochs (16 seconds per epoch on CPU) In comparison, the Facebook paper achieves 50% and 20% for the LSTM baseline.
The task does not traditionally parse the question separately. This likely improves accuracy and is a good example of merging two RNNs.
The word vector embeddings are not shared between the story and question RNNs.
See how the accuracy changes given 10,000 training samples (en-10k) instead of only 1000. 1000 was used in order to be comparable to the original paper.
Experiment with GRU, LSTM, and JZS1-3 as they give subtly different results.
The length and noise (i.e. ‘useless’ story components) impact the ability for LSTMs / GRUs to provide the correct answer. Given only the supporting facts, these RNNs can achieve 100% accuracy on many tasks. Memory networks and neural networks that use attentional processes can efficiently search through this noise to find the relevant statements, improving performance substantially. This becomes especially obvious on QA2 and QA3, both far longer than QA1.
library(keras)
+library(readr)
+library(stringr)
+library(purrr)
+library(tibble)
+library(dplyr)
+
+# Function definition -----------------------------------------------------
+
+tokenize_words <- function(x){
+ x <- x %>%
+ str_replace_all('([[:punct:]]+)', ' \\1') %>%
+ str_split(' ') %>%
+ unlist()
+ x[x != ""]
+}
+
+parse_stories <- function(lines, only_supporting = FALSE){
+ lines <- lines %>%
+ str_split(" ", n = 2) %>%
+ map_df(~tibble(nid = as.integer(.x[[1]]), line = .x[[2]]))
+
+ lines <- lines %>%
+ mutate(
+ split = map(line, ~str_split(.x, "\t")[[1]]),
+ q = map_chr(split, ~.x[1]),
+ a = map_chr(split, ~.x[2]),
+ supporting = map(split, ~.x[3] %>% str_split(" ") %>% unlist() %>% as.integer()),
+ story_id = c(0, cumsum(nid[-nrow(.)] > nid[-1]))
+ ) %>%
+ select(-split)
+
+ stories <- lines %>%
+ filter(is.na(a)) %>%
+ select(nid_story = nid, story_id, story = q)
+
+ questions <- lines %>%
+ filter(!is.na(a)) %>%
+ select(-line) %>%
+ left_join(stories, by = "story_id") %>%
+ filter(nid_story < nid)
+
+ if(only_supporting){
+ questions <- questions %>%
+ filter(map2_lgl(nid_story, supporting, ~.x %in% .y))
+ }
+
+ questions %>%
+ group_by(story_id, nid, question = q, answer = a) %>%
+ summarise(story = paste(story, collapse = " ")) %>%
+ ungroup() %>%
+ mutate(
+ question = map(question, ~tokenize_words(.x)),
+ story = map(story, ~tokenize_words(.x)),
+ id = row_number()
+ ) %>%
+ select(id, question, answer, story)
+}
+
+vectorize_stories <- function(data, vocab, story_maxlen, query_maxlen){
+
+ questions <- map(data$question, function(x){
+ map_int(x, ~which(.x == vocab))
+ })
+
+ stories <- map(data$story, function(x){
+ map_int(x, ~which(.x == vocab))
+ })
+
+ # "" represents padding
+ answers <- sapply(c("", vocab), function(x){
+ as.integer(x == data$answer)
+ })
+
+
+ list(
+ questions = pad_sequences(questions, maxlen = query_maxlen),
+ stories = pad_sequences(stories, maxlen = story_maxlen),
+ answers = answers
+ )
+}
+
+# Parameters --------------------------------------------------------------
+
+max_length <- 99999
+embed_hidden_size <- 50
+batch_size <- 32
+epochs <- 40
+
+# Data Preparation --------------------------------------------------------
+
+path <- get_file(
+ fname = "babi-tasks-v1-2.tar.gz",
+ origin = "https://s3.amazonaws.com/text-datasets/babi_tasks_1-20_v1-2.tar.gz"
+)
+untar(path, exdir = str_replace(path, fixed(".tar.gz"), "/"))
+path <- str_replace(path, fixed(".tar.gz"), "/")
+
+# Default QA1 with 1000 samples
+# challenge = '%stasks_1-20_v1-2/en/qa1_single-supporting-fact_%s.txt'
+# QA1 with 10,000 samples
+# challenge = '%stasks_1-20_v1-2/en-10k/qa1_single-supporting-fact_%s.txt'
+# QA2 with 1000 samples
+challenge <- "%stasks_1-20_v1-2/en/qa2_two-supporting-facts_%s.txt"
+# QA2 with 10,000 samples
+# challenge = '%stasks_1-20_v1-2/en-10k/qa2_two-supporting-facts_%s.txt'
+
+train <- read_lines(sprintf(challenge, path, "train")) %>%
+ parse_stories() %>%
+ filter(map_int(story, ~length(.x)) <= max_length)
+
+test <- read_lines(sprintf(challenge, path, "test")) %>%
+ parse_stories() %>%
+ filter(map_int(story, ~length(.x)) <= max_length)
+
+# extract the vocabulary
+all_data <- bind_rows(train, test)
+vocab <- c(unlist(all_data$question), all_data$answer,
+ unlist(all_data$story)) %>%
+ unique() %>%
+ sort()
+
+# Reserve 0 for masking via pad_sequences
+vocab_size <- length(vocab) + 1
+story_maxlen <- map_int(all_data$story, ~length(.x)) %>% max()
+query_maxlen <- map_int(all_data$question, ~length(.x)) %>% max()
+
+# vectorized versions of training and test sets
+train_vec <- vectorize_stories(train, vocab, story_maxlen, query_maxlen)
+test_vec <- vectorize_stories(test, vocab, story_maxlen, query_maxlen)
+
+# Defining the model ------------------------------------------------------
+
+sentence <- layer_input(shape = c(story_maxlen), dtype = "int32")
+encoded_sentence <- sentence %>%
+ layer_embedding(input_dim = vocab_size, output_dim = embed_hidden_size) %>%
+ layer_dropout(rate = 0.3)
+
+question <- layer_input(shape = c(query_maxlen), dtype = "int32")
+encoded_question <- question %>%
+ layer_embedding(input_dim = vocab_size, output_dim = embed_hidden_size) %>%
+ layer_dropout(rate = 0.3) %>%
+ layer_lstm(units = embed_hidden_size) %>%
+ layer_repeat_vector(n = story_maxlen)
+
+merged <- list(encoded_sentence, encoded_question) %>%
+ layer_add() %>%
+ layer_lstm(units = embed_hidden_size) %>%
+ layer_dropout(rate = 0.3)
+
+preds <- merged %>%
+ layer_dense(units = vocab_size, activation = "softmax")
+
+model <- keras_model(inputs = list(sentence, question), outputs = preds)
+model %>% compile(
+ optimizer = "adam",
+ loss = "categorical_crossentropy",
+ metrics = "accuracy"
+)
+
+model
+
+# Training ----------------------------------------------------------------
+
+model %>% fit(
+ x = list(train_vec$stories, train_vec$questions),
+ y = train_vec$answers,
+ batch_size = batch_size,
+ epochs = epochs,
+ validation_split=0.05
+)
+
+evaluation <- model %>% evaluate(
+ x = list(test_vec$stories, test_vec$questions),
+ y = test_vec$answers,
+ batch_size = batch_size
+)
+
+evaluation
Train a simple deep CNN on the CIFAR10 small images dataset.
+It gets down to 0.65 test logloss in 25 epochs, and down to 0.55 after 50 epochs. (it’s still underfitting at that point, though).
+library(keras)
+
+# Parameters --------------------------------------------------------------
+
+batch_size <- 32
+epochs <- 200
+data_augmentation <- TRUE
+
+
+# Data Preparation --------------------------------------------------------
+
+# see ?dataset_cifar10 for more info
+cifar10 <- dataset_cifar10()
+
+x_train <- cifar10$train$x/255
+x_test <- cifar10$test$x/255
+y_train <- to_categorical(cifar10$train$y, num_classes = 10)
+y_test <- to_categorical(cifar10$test$y, num_classes = 10)
+
+# Defining the model ------------------------------------------------------
+
+model <- keras_model_sequential()
+
+model %>%
+ layer_conv_2d(
+ filter = 32, kernel_size = c(3,3), padding = "same",
+ input_shape = c(32, 32, 3)
+ ) %>%
+ layer_activation("relu") %>%
+ layer_conv_2d(filter = 32, kernel_size = c(3,3)) %>%
+ layer_activation("relu") %>%
+ layer_max_pooling_2d(pool_size = c(2,2)) %>%
+ layer_dropout(0.25) %>%
+
+ layer_conv_2d(filter = 32, kernel_size = c(3,3), padding = "same") %>%
+ layer_activation("relu") %>%
+ layer_conv_2d(filter = 32, kernel_size = c(3,3)) %>%
+ layer_activation("relu") %>%
+ layer_max_pooling_2d(pool_size = c(2,2)) %>%
+ layer_dropout(0.25) %>%
+
+ layer_flatten() %>%
+ layer_dense(512) %>%
+ layer_activation("relu") %>%
+ layer_dropout(0.5) %>%
+ layer_dense(10) %>%
+ layer_activation("softmax")
+
+opt <- optimizer_rmsprop(lr = 0.0001, decay = 1e-6)
+
+model %>% compile(
+ loss = "categorical_crossentropy",
+ optimizer = opt,
+ metrics = "accuracy"
+)
+
+
+# Training ----------------------------------------------------------------
+
+if(!data_augmentation){
+
+ model %>% fit(
+ x_train, y_train,
+ batch_size = batch_size,
+ epochs = epochs,
+ validation_data = list(x_test, y_test),
+ shuffle = TRUE
+ )
+
+} else {
+
+ datagen <- image_data_generator(
+ featurewise_center = TRUE,
+ featurewise_std_normalization = TRUE,
+ rotation_range = 20,
+ width_shift_range = 0.2,
+ height_shift_range = 0.2,
+ horizontal_flip = TRUE
+ )
+
+ datagen %>% fit_image_data_generator(x_train)
+
+ model %>% fit_generator(
+ flow_images_from_data(x_train, y_train, datagen, batch_size = batch_size),
+ steps_per_epoch = as.integer(50000/batch_size),
+ epochs = epochs,
+ validation_data = list(x_test, y_test)
+ )
+
+}
library(keras)
# This script demonstrates the use of a convolutional LSTM network.
+# This network is used to predict the next frame of an artificially
+# generated movie which contains moving squares.
+library(keras)
+library(abind)
+library(raster)
+
+# Function Definition -----------------------------------------------------
+
+generate_movies <- function(n_samples = 1200, n_frames = 15){
+
+ rows <- 80
+ cols <- 80
+
+ noisy_movies <- array(0, dim = c(n_samples, n_frames, rows, cols))
+ shifted_movies <- array(0, dim = c(n_samples, n_frames, rows, cols))
+
+ n <- sample(3:8, 1)
+
+ for(s in 1:n_samples){
+ for(i in 1:n){
+ # Initial position
+ xstart <- sample(20:60, 1)
+ ystart <- sample(20:60, 1)
+
+ # Direction of motion
+ directionx <- sample(-1:1, 1)
+ directiony <- sample(-1:1, 1)
+
+ # Size of the square
+ w <- sample(2:3, 1)
+
+ x_shift <- xstart + directionx*(0:(n_frames))
+ y_shift <- ystart + directiony*(0:(n_frames))
+
+ for(t in 1:n_frames){
+ square_x <- (x_shift[t] - w):(x_shift[t] + w)
+ square_y <- (y_shift[t] - w):(y_shift[t] + w)
+
+ noisy_movies[s, t, square_x, square_y] <-
+ noisy_movies[s, t, square_x, square_y] + 1
+
+ # Make it more robust by adding noise.
+ # The idea is that if during inference,
+ # the value of the pixel is not exactly one,
+ # we need to train the network to be robust and still
+ # consider it as a pixel belonging to a square.
+ if(runif(1) > 0.5){
+ noise_f <- sample(c(-1, 1), 1)
+
+ square_x_n <- (x_shift[t] - w - 1):(x_shift[t] + w + 1)
+ square_y_n <- (y_shift[t] - w - 1):(y_shift[t] + w + 1)
+
+ noisy_movies[s, t, square_x_n, square_y_n] <-
+ noisy_movies[s, t, square_x_n, square_y_n] + noise_f*0.1
+
+ }
+
+ # Shift the ground truth by 1
+ square_x_s <- (x_shift[t+1] - w):(x_shift[t+1] + w)
+ square_y_s <- (y_shift[t+1] - w):(y_shift[t+1] + w)
+
+ shifted_movies[s, t, square_x_s, square_y_s] <-
+ shifted_movies[s, t, square_x_s, square_y_s] + 1
+ }
+ }
+ }
+
+ # Cut to a 40x40 window
+ noisy_movies <- noisy_movies[,,21:60, 21:60]
+ shifted_movies = shifted_movies[,,21:60, 21:60]
+
+ noisy_movies[noisy_movies > 1] <- 1
+ shifted_movies[shifted_movies > 1] <- 1
+
+ # add channel dimension
+ dim(noisy_movies) <- c(dim(noisy_movies), 1)
+ dim(shifted_movies) <- c(dim(shifted_movies), 1)
+
+ list(
+ noisy_movies = noisy_movies,
+ shifted_movies = shifted_movies
+ )
+}
+
+
+# Data Preparation --------------------------------------------------------
+
+# Artificial data generation:
+# Generate movies with 3 to 7 moving squares inside.
+# The squares are of shape 1x1 or 2x2 pixels,
+# which move linearly over time.
+# For convenience we first create movies with bigger width and height (80x80)
+# and at the end we select a 40x40 window.
+movies <- generate_movies(n_samples = 1000, n_frames = 15)
+more_movies <- generate_movies(n_samples = 200, n_frames = 15)
+
+
+# Model definition --------------------------------------------------------
+
+model <- keras_model_sequential()
+
+model %>%
+ layer_conv_lstm_2d(
+ input_shape = list(NULL,40,40,1),
+ filters = 40, kernel_size = c(3,3),
+ padding = "same",
+ return_sequences = TRUE
+ ) %>%
+ layer_batch_normalization() %>%
+
+ layer_conv_lstm_2d(
+ filters = 40, kernel_size = c(3,3),
+ padding = "same", return_sequences = TRUE
+ ) %>%
+ layer_batch_normalization() %>%
+
+ layer_conv_lstm_2d(
+ filters = 40, kernel_size = c(3,3),
+ padding = "same", return_sequences = TRUE
+ ) %>%
+ layer_batch_normalization() %>%
+
+ layer_conv_lstm_2d(
+ filters = 40, kernel_size = c(3,3),
+ padding = "same", return_sequences = TRUE
+ ) %>%
+ layer_batch_normalization() %>%
+
+ layer_conv_3d(
+ filters = 1, kernel_size = c(3,3,3),
+ activation = "sigmoid",
+ padding = "same", data_format ="channels_last"
+ )
+
+model %>% compile(
+ loss = "binary_crossentropy",
+ optimizer = "adadelta"
+)
+
+model
+
+
+# Training ----------------------------------------------------------------
+
+model %>% fit(
+ movies$noisy_movies,
+ movies$shifted_movies,
+ batch_size = 10,
+ epochs = 30,
+ validation_split = 0.05
+)
+
+# Visualization ----------------------------------------------------------------
+# Testing the network on one movie
+# feed it with the first 7 positions and then
+# predict the new positions
+
+which <- 100 #Example to visualize on
+
+track <- more_movies$noisy_movies[which,1:8,,,1]
+track <- array(track, c(1,8,40,40,1))
+for (k in 1:15){
+if (k<8){
+ png(paste0(k,'_animate.png'))
+ par(mfrow=c(1,2),bg = 'white')
+ (more_movies$noisy_movies[which,k,,,1]) %>% raster() %>% plot() %>% title (main=paste0('Ground_',k))
+ (more_movies$noisy_movies[which,k,,,1]) %>% raster() %>% plot() %>% title (main=paste0('Ground_',k))
+ dev.off()
+} else {
+ # And then compare the predictions
+ # to the ground truth
+ png(paste0(k,'_animate.png'))
+ par(mfrow=c(1,2),bg = 'white')
+ (more_movies$noisy_movies[which,k,,,1]) %>% raster() %>% plot() %>% title (main=paste0('Ground_',k))
+
+ new_pos <- model %>% predict(track) #Make Prediction
+ new_pos_loc <- new_pos[1,k,1:40,1:40,1] #Slice the last row
+ new_pos_loc %>% raster() %>% plot() %>% title (main=paste0('Pred_',k))
+
+ new_pos <- array(new_pos_loc, c(1,1, 40,40,1)) #Reshape it
+ track <- abind(track,new_pos,along = 2) #Bind it to the earlier data
+ dev.off()
+}
+}
+# you can also create a gif by running
+system("convert -delay 40 *.png animation.gif")
Deep Dreaming in Keras.
+It is preferable to run this script on GPU, for speed.
+Example results: http://i.imgur.com/FX6ROg9.jpg
+library(keras)
+library(tensorflow)
+library(purrr)
+library(R6)
+K <- backend()
+
+# Function Definitions ----------------------------------------------------
+
+preprocess_image <- function(image_path, height, width){
+ image_load(image_path, target_size = c(height, width)) %>%
+ image_to_array() %>%
+ array(dim = c(1, dim(.))) %>%
+ imagenet_preprocess_input()
+}
+
+deprocess_image <- function(x){
+ x <- x[1,,,]
+ # Remove zero-center by mean pixel
+ x[,,1] <- x[,,1] + 103.939
+ x[,,2] <- x[,,2] + 116.779
+ x[,,3] <- x[,,3] + 123.68
+ # 'BGR'->'RGB'
+ x <- x[,,c(3,2,1)]
+ # clip to interval 0, 255
+ x[x > 255] <- 255
+ x[x < 0] <- 0
+ x[] <- as.integer(x)/255
+ x
+}
+
+# calculates the total variation loss
+# https://en.wikipedia.org/wiki/Total_variation_denoising
+total_variation_loss <- function(x, h, w){
+
+ y_ij <- x[,0:(h - 2L), 0:(w - 2L),]
+ y_i1j <- x[,1:(h - 1L), 0:(w - 2L),]
+ y_ij1 <- x[,0:(h - 2L), 1:(w - 1L),]
+
+ a <- K$square(y_ij - y_i1j)
+ b <- K$square(y_ij - y_ij1)
+ K$sum(K$pow(a + b, 1.25))
+}
+
+
+# Parameters --------------------------------------------------------
+
+# some settings we found interesting
+saved_settings = list(
+ bad_trip = list(
+ features = list(
+ block4_conv1 = 0.05,
+ block4_conv2 = 0.01,
+ block4_conv3 = 0.01
+ ),
+ continuity = 0.1,
+ dream_l2 = 0.8,
+ jitter = 5
+ ),
+ dreamy = list(
+ features = list(
+ block5_conv1 = 0.05,
+ block5_conv2 = 0.02
+ ),
+ continuity = 0.1,
+ dream_l2 = 0.02,
+ jitter = 0
+ )
+)
+
+# the settings we will use in this experiment
+img_height <- 600L
+img_width <- 600L
+img_size <- c(img_height, img_width, 3)
+settings <- saved_settings$dreamy
+image <- preprocess_image("deep_dream.jpg", img_height, img_width)
+
+# Model definition --------------------------------------------------------
+
+# this will contain our generated image
+dream <- layer_input(batch_shape = c(1, img_size))
+
+# build the VGG16 network with our placeholder
+# the model will be loaded with pre-trained ImageNet weights
+model <- application_vgg16(input_tensor = dream, weights = "imagenet",
+ include_top = FALSE)
+
+
+# get the symbolic outputs of each "key" layer (we gave them unique names).
+layer_dict <- model$layers
+names(layer_dict) <- map_chr(layer_dict ,~.x$name)
+
+# define the loss
+loss <- tf$Variable(0.0)
+for(layer_name in names(settings$features)){
+ # add the L2 norm of the features of a layer to the loss
+ coeff <- settings$features[[layer_name]]
+ x <- layer_dict[[layer_name]]$output
+ out_shape <- layer_dict[[layer_name]]$output_shape %>% unlist()
+ # we avoid border artifacts by only involving non-border pixels in the loss
+ loss <- loss -
+ coeff*K$sum(K$square(x[,3:(out_shape[2] - 2), 3:(out_shape[3] - 2),])) /
+ prod(out_shape[-1])
+}
+
+# add continuity loss (gives image local coherence, can result in an artful blur)
+loss <- loss + settings$continuity*
+ total_variation_loss(x = dream, img_height, img_width)/
+ prod(img_size)
+# add image L2 norm to loss (prevents pixels from taking very high values, makes image darker)
+loss <- loss + settings$dream_l2*K$sum(K$square(dream))/prod(img_size)
+
+# feel free to further modify the loss as you see fit, to achieve new effects...
+
+# compute the gradients of the dream wrt the loss
+grads <- K$gradients(loss, dream)[[1]]
+
+f_outputs <- K$`function`(list(dream), list(loss,grads))
+
+eval_loss_and_grads <- function(image){
+ dim(image) <- c(1, img_size)
+ outs <- f_outputs(list(image))
+ list(
+ loss_value = outs[[1]],
+ grad_values = as.numeric(outs[[2]])
+ )
+}
+
+# Loss and gradients evaluator.
+#
+# This Evaluator class makes it possible
+# to compute loss and gradients in one pass
+# while retrieving them via two separate functions,
+# "loss" and "grads". This is done because scipy.optimize
+# requires separate functions for loss and gradients,
+# but computing them separately would be inefficient.
+Evaluator <- R6Class(
+ "Evaluator",
+ public = list(
+
+ loss_value = NULL,
+ grad_values = NULL,
+
+ initialize = function() {
+ self$loss_value <- NULL
+ self$grad_values <- NULL
+ },
+
+ loss = function(x){
+ loss_and_grad <- eval_loss_and_grads(x)
+ self$loss_value <- loss_and_grad$loss_value
+ self$grad_values <- loss_and_grad$grad_values
+ self$loss_value
+ },
+
+ grads = function(x){
+ grad_values <- self$grad_values
+ self$loss_value <- NULL
+ self$grad_values <- NULL
+ grad_values
+ }
+
+ )
+)
+
+evaluator <- Evaluator$new()
+
+# Run optimization (L-BFGS) over the pixels of the generated image
+# so as to minimize the loss
+for(i in 1:5){
+
+ # add random jitter to initial image
+ random_jitter <- settings$jitter*2*(runif(prod(img_size)) - 0.5) %>%
+ array(dim = c(1, img_size))
+ image <- image + random_jitter
+
+ # Run L-BFGS
+ opt <- optim(
+ as.numeric(image), fn = evaluator$loss, gr = evaluator$grads,
+ method = "L-BFGS-B",
+ control = list(maxit = 2)
+ )
+
+ # Print loss value
+ print(opt$value)
+
+ # decode the image
+ image <- opt$par
+ dim(image) <- c(1, img_size)
+ image <- image - random_jitter
+
+ # plot
+ im <- deprocess_image(image)
+ plot(as.raster(im))
+
+}
library(keras)
Train a Bidirectional LSTM on the IMDB sentiment classification task.
+Output after 4 epochs on CPU: ~0.8146 Time per epoch on CPU (Core i7): ~150s.
+library(keras)
+
+max_features <- 20000
+
+# cut texts after this number of words
+# (among top max_features most common words)
+maxlen <- 100
+
+batch_size <- 32
+
+cat('Loading data...\n')
+imdb <- dataset_imdb(num_words = max_features)
+x_train <- imdb$train$x
+y_train <- imdb$train$y
+x_test <- imdb$test$x
+y_test <- imdb$test$y
+
+cat(length(x_train), 'train sequences\n')
+cat(length(x_test), 'test sequences\n')
+
+cat('Pad sequences (samples x time)\n')
+x_train <- pad_sequences(x_train, maxlen = maxlen)
+x_test <- pad_sequences(x_test, maxlen = maxlen)
+cat('x_train shape:', dim(x_train), '\n')
+cat('x_test shape:', dim(x_test), '\n')
+
+model <- keras_model_sequential()
+model %>%
+ layer_embedding(input_dim = max_features, output_dim = 128, input_length = maxlen) %>%
+ bidirectional(layer_lstm(units = 64)) %>%
+ layer_dropout(rate = 0.5) %>%
+ layer_dense(units = 1, activation = 'sigmoid')
+
+# try using different optimizers and different optimizer configs
+model %>% compile(
+ loss = 'binary_crossentropy',
+ optimizer = 'adam',
+ metrics = c('accuracy')
+)
+
+cat('Train...\n')
+model %>% fit(
+ x_train, y_train,
+ batch_size = batch_size,
+ epochs = 4,
+ validation_data = list(x_test, y_test)
+)
This example demonstrates the use of Convolution1D for text classification.
+Gets to 0.89 test accuracy after 2 epochs. 90s/epoch on Intel i5 2.4Ghz CPU. 10s/epoch on Tesla K40 GPU.
+library(keras)
+
+# set parameters:
+max_features <- 5000
+maxlen <- 400
+batch_size <- 32
+embedding_dims <- 50
+filters <- 250
+kernel_size <- 3
+hidden_dims <- 250
+epochs <- 2
+
+
+# Data Preparation --------------------------------------------------------
+
+# Keras load all data into a list with the following structure:
+# List of 2
+# $ train:List of 2
+# ..$ x:List of 25000
+# .. .. [list output truncated]
+# .. ..- attr(*, "dim")= int 25000
+# ..$ y: num [1:25000(1d)] 1 0 0 1 0 0 1 0 1 0 ...
+# $ test :List of 2
+# ..$ x:List of 25000
+# .. .. [list output truncated]
+# .. ..- attr(*, "dim")= int 25000
+# ..$ y: num [1:25000(1d)] 1 1 1 1 1 0 0 0 1 1 ...
+#
+# The x data includes integer sequences, each integer is a word.
+# The y data includes a set of integer labels (0 or 1).
+# The num_words argument indicates that only the max_fetures most frequent
+# words will be integerized. All other will be ignored.
+# See help(dataset_imdb)
+imdb <- dataset_imdb(num_words = max_features)
+
+# pad the sequences, so they have all the same lenght
+# this will conver our dataset into a matrix: each line is a review
+# and each column a word on the sequence.
+# we pad the sequences with 0 to the left.
+x_train <- imdb$train$x %>%
+ pad_sequences(maxlen = maxlen)
+
+x_test <- imdb$test$x %>%
+ pad_sequences(maxlen = maxlen)
+
+# Defining the model ------------------------------------------------------
+
+model <- keras_model_sequential()
+
+model %>%
+ # we start off with an efficient embedding layer which maps
+ # our vocab indices into embedding_dims dimensions
+ layer_embedding(max_features, embedding_dims, input_length = maxlen) %>%
+ layer_dropout(0.2) %>%
+ # we add a Convolution1D, which will learn filters
+ # word group filters of size filter_length:
+ layer_conv_1d(
+ filters, kernel_size,
+ padding = "valid", activation = "relu", strides = 1
+ ) %>%
+ # we use max pooling:
+ layer_global_max_pooling_1d() %>%
+ # We add a vanilla hidden layer:
+ layer_dense(hidden_dims) %>%
+ layer_dropout(0.2) %>%
+ layer_activation("relu") %>%
+ # We project onto a single unit output layer, and squash it with a sigmoid:
+ layer_dense(1) %>%
+ layer_activation("sigmoid")
+
+
+model %>% compile(
+ loss = "binary_crossentropy",
+ optimizer = "adam",
+ metrics = "accuracy"
+)
+
+# Training ----------------------------------------------------------------
+
+model %>%
+ fit(
+ x_train, imdb$train$y,
+ batch_size = batch_size,
+ epochs = epochs,
+ validation_data = list(x_test, imdb$test$y)
+ )
Train a recurrent convolutional network on the IMDB sentiment classification task.
+Gets to 0.8498 test accuracy after 2 epochs. 41s/epoch on K520 GPU.
+library(keras)
+
+# Parameters --------------------------------------------------------------
+
+# Embedding
+max_features = 20000
+maxlen = 100
+embedding_size = 128
+
+# Convolution
+kernel_size = 5
+filters = 64
+pool_size = 4
+
+# LSTM
+lstm_output_size = 70
+
+# Training
+batch_size = 30
+epochs = 2
+
+# Data Preparation --------------------------------------------------------
+
+# Keras load all data into a list with the following structure:
+# List of 2
+# $ train:List of 2
+# ..$ x:List of 25000
+# .. .. [list output truncated]
+# .. ..- attr(*, "dim")= int 25000
+# ..$ y: num [1:25000(1d)] 1 0 0 1 0 0 1 0 1 0 ...
+# $ test :List of 2
+# ..$ x:List of 25000
+# .. .. [list output truncated]
+# .. ..- attr(*, "dim")= int 25000
+# ..$ y: num [1:25000(1d)] 1 1 1 1 1 0 0 0 1 1 ...
+#
+# The x data includes integer sequences, each integer is a word.
+# The y data includes a set of integer labels (0 or 1).
+# The num_words argument indicates that only the max_fetures most frequent
+# words will be integerized. All other will be ignored.
+# See help(dataset_imdb)
+imdb <- dataset_imdb(num_words = max_features)
+
+# pad the sequences, so they have all the same lenght
+# this will conver our dataset into a matrix: each line is a review
+# and each column a word on the sequence.
+# we pad the sequences with 0 to the left.
+x_train <- imdb$train$x %>%
+ pad_sequences(maxlen = maxlen)
+
+x_test <- imdb$test$x %>%
+ pad_sequences(maxlen = maxlen)
+
+# Defining the model ------------------------------------------------------
+
+model <- keras_model_sequential()
+
+model %>%
+ layer_embedding(max_features, embedding_size, input_length = maxlen) %>%
+ layer_dropout(0.25) %>%
+ layer_conv_1d(
+ filters,
+ kernel_size,
+ padding = "valid",
+ activation = "relu",
+ strides = 1
+ ) %>%
+ layer_max_pooling_1d(pool_size) %>%
+ layer_lstm(lstm_output_size) %>%
+ layer_dense(1) %>%
+ layer_activation("sigmoid")
+
+model %>% compile(
+ loss = "binary_crossentropy",
+ optimizer = "adam",
+ metrics = "accuracy"
+)
+
+# Training ----------------------------------------------------------------
+
+model %>% fit(
+ x_train, imdb$train$y,
+ batch_size = batch_size,
+ epochs = epochs,
+ validation_data = list(x_test, imdb$test$y)
+)
This example demonstrates the use of fasttext for text classification
+Based on Joulin et al’s paper:
+Bags of Tricks for Efficient Text Classification https://arxiv.org/abs/1607.01759
+Results on IMDB datasets with uni and bi-gram embeddings: Uni-gram: 0.8813 test accuracy after 5 epochs. 8s/epoch on i7 cpu. Bi-gram : 0.9056 test accuracy after 5 epochs. 2s/epoch on GTx 980M gpu.
+library(keras)
+library(purrr)
+
+# Function definition -----------------------------------------------------
+
+create_ngram_set <- function(input_list, ngram_value = 2){
+ indices <- map(0:(length(input_list) - ngram_value), ~1:ngram_value + .x)
+ indices %>%
+ map_chr(~input_list[.x] %>% paste(collapse = "|")) %>%
+ unique()
+}
+
+add_ngram <- function(sequences, token_indice, ngram_range = 2){
+ ngrams <- map(
+ sequences,
+ create_ngram_set, ngram_value = ngram_range
+ )
+
+ seqs <- map2(sequences, ngrams, function(x, y){
+ tokens <- token_indice$token[token_indice$ngrams %in% y]
+ c(x, tokens)
+ })
+
+ seqs
+}
+
+
+# Parameters --------------------------------------------------------------
+
+# ngram_range = 2 will add bi-grams features
+ngram_range <- 2
+max_features <- 20000
+maxlen <- 400
+batch_size <- 32
+embedding_dims <- 50
+epochs <- 5
+
+
+# Data preparation --------------------------------------------------------
+
+imdb_data <- dataset_imdb(num_words = max_features)
+
+print(length(imdb_data$train$x)) # train sequences
+print(length(imdb_data$test$x)) # test sequences
+print(sprintf("Average train sequence length: %f", mean(map_int(imdb_data$train$x, length))))
+print(sprintf("Average test sequence length: %f", mean(map_int(imdb_data$test$x, length))))
+
+if(ngram_range > 1) {
+
+ # Create set of unique n-gram from the training set.
+ ngrams <- imdb_data$train$x %>%
+ map(create_ngram_set) %>%
+ unlist() %>%
+ unique()
+
+ # Dictionary mapping n-gram token to a unique integer.
+ # Integer values are greater than max_features in order
+ # to avoid collision with existing features.
+ token_indice <- data.frame(
+ ngrams = ngrams,
+ token = 1:length(ngrams) + (max_features),
+ stringsAsFactors = FALSE
+ )
+
+ # max_features is the highest integer that could be found in the dataset.
+ max_features <- max(token_indice$token) + 1
+
+ # Augmenting x_train and x_test with n-grams features
+ imdb_data$train$x <- add_ngram(imdb_data$train$x, token_indice, ngram_range)
+ imdb_data$test$x <- add_ngram(imdb_data$test$x, token_indice, ngram_range)
+}
+
+# pad sequences
+imdb_data$train$x <- pad_sequences(imdb_data$train$x, maxlen = maxlen)
+imdb_data$test$x <- pad_sequences(imdb_data$test$x, maxlen = maxlen)
+
+
+# Model definition --------------------------------------------------------
+
+model <- keras_model_sequential()
+
+model %>%
+ layer_embedding(
+ input_dim = max_features, output_dim = embedding_dims,
+ input_length = maxlen
+ ) %>%
+ layer_global_average_pooling_1d() %>%
+ layer_dense(1, activation = "sigmoid")
+
+model %>% compile(
+ loss = "binary_crossentropy",
+ optimizer = "adam",
+ metrics = "accuracy"
+)
+
+
+# Fitting -----------------------------------------------------------------
+
+model %>% fit(
+ imdb_data$train$x, imdb_data$train$y,
+ batch_size = batch_size,
+ epochs = epochs,
+ validation_data = list(imdb_data$test$x, imdb_data$test$y)
+)
Trains a LSTM on the IMDB sentiment classification task.
+The dataset is actually too small for LSTM to be of any advantage compared to simpler, much faster methods such as TF-IDF + LogReg.
+Notes:
+RNNs are tricky. Choice of batch size is important, choice of loss and optimizer is critical, etc. Some configurations won’t converge.
LSTM loss decrease patterns during training can be quite different from what you see with CNNs/MLPs/etc.
library(keras)
+
+max_features <- 20000
+maxlen <- 80 # cut texts after this number of words (among top max_features most common words)
+batch_size <- 32
+
+cat('Loading data...\n')
+imdb <- dataset_imdb(num_words = max_features)
+x_train <- imdb$train$x
+y_train <- imdb$train$y
+x_test <- imdb$test$x
+y_test <- imdb$test$y
+
+cat(length(x_train), 'train sequences\n')
+cat(length(x_test), 'test sequences\n')
+
+cat('Pad sequences (samples x time)\n')
+x_train <- pad_sequences(x_train, maxlen = maxlen)
+x_test <- pad_sequences(x_test, maxlen = maxlen)
+cat('x_train shape:', dim(x_train), '\n')
+cat('x_test shape:', dim(x_test), '\n')
+
+cat('Build model...\n')
+model <- keras_model_sequential()
+model %>%
+ layer_embedding(input_dim = max_features, output_dim = 128) %>%
+ layer_lstm(units = 64, dropout = 0.2, recurrent_dropout = 0.2) %>%
+ layer_dense(units = 1, activation = 'sigmoid')
+
+# try using different optimizers and different optimizer configs
+model %>% compile(
+ loss = 'binary_crossentropy',
+ optimizer = 'adam',
+ metrics = c('accuracy')
+)
+
+cat('Train...\n')
+model %>% fit(
+ x_train, y_train,
+ batch_size = batch_size,
+ epochs = 15,
+ validation_data = list(x_test, y_test)
+)
+scores <- model %>% evaluate(
+ x_test, y_test,
+ batch_size = batch_size
+)
+cat('Test score:', scores[[1]])
+cat('Test accuracy', scores[[2]])
library(keras)
Example script to generate text from Nietzsche’s writings.
+At least 20 epochs are required before the generated text starts sounding coherent.
+It is recommended to run this script on GPU, as recurrent networks are quite computationally intensive.
+If you try this script on new data, make sure your corpus has at least ~100k characters. ~1M is better.
+library(keras)
+library(readr)
+library(stringr)
+library(purrr)
+library(tokenizers)
+
+
+# Parameters --------------------------------------------------------------
+
+maxlen <- 40
+
+# Data preparation --------------------------------------------------------
+
+path <- get_file(
+ 'nietzsche.txt',
+ origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt'
+ )
+
+text <- read_lines(path) %>%
+ str_to_lower() %>%
+ str_c(collapse = "\n") %>%
+ tokenize_characters(strip_non_alphanum = FALSE, simplify = TRUE)
+
+print(sprintf("corpus length: %d", length(text)))
+
+chars <- text %>%
+ unique() %>%
+ sort()
+
+print(sprintf("total chars: %d", length(chars)))
+
+# cut the text in semi-redundant sequences of maxlen characters
+dataset <- map(
+ seq(1, length(text) - maxlen - 1, by = 3),
+ ~list(sentece = text[.x:(.x + maxlen - 1)], next_char = text[.x + maxlen])
+ )
+
+dataset <- transpose(dataset)
+
+# vectorization
+X <- array(0, dim = c(length(dataset$sentece), maxlen, length(chars)))
+y <- array(0, dim = c(length(dataset$sentece), length(chars)))
+
+for(i in 1:length(dataset$sentece)){
+
+ X[i,,] <- sapply(chars, function(x){
+ as.integer(x == dataset$sentece[[i]])
+ })
+
+ y[i,] <- as.integer(chars == dataset$next_char[[i]])
+
+}
+
+# Model definition --------------------------------------------------------
+
+model <- keras_model_sequential()
+
+model %>%
+ layer_lstm(128, input_shape = c(maxlen, length(chars))) %>%
+ layer_dense(length(chars)) %>%
+ layer_activation("softmax")
+
+optimizer <- optimizer_rmsprop(lr = 0.01)
+
+model %>% compile(
+ loss = "categorical_crossentropy",
+ optimizer = optimizer
+)
+
+
+# Training and results ----------------------------------------------------
+
+sample_mod <- function(preds, temperature = 1){
+ preds <- log(preds)/temperature
+ exp_preds <- exp(preds)
+ preds <- exp_preds/sum(exp(preds))
+
+ rmultinom(1, 1, preds) %>%
+ as.integer() %>%
+ which.max()
+}
+
+for(iteration in 1:60){
+
+ cat(sprintf("iteration: %02d ---------------\n\n", iteration))
+
+ model %>% fit(
+ X, y,
+ batch_size = 128,
+ epochs = 1
+ )
+
+ for(diversity in c(0.2, 0.5, 1, 1.2)){
+
+ cat(sprintf("diversity: %f ---------------\n\n", diversity))
+
+ start_index <- sample(1:(length(text) - maxlen), size = 1)
+ sentence <- text[start_index:(start_index + maxlen - 1)]
+ generated <- ""
+
+ for(i in 1:400){
+
+ x <- sapply(chars, function(x){
+ as.integer(x == sentence)
+ })
+ dim(x) <- c(1, dim(x))
+
+ preds <- predict(model, x)
+ next_index <- sample_mod(preds, diversity)
+ next_char <- chars[next_index]
+
+ generated <- str_c(generated, next_char, collapse = "")
+ sentence <- c(sentence[-1], next_char)
+
+ }
+
+ cat(generated)
+ cat("\n\n")
+
+ }
+}
Train an Auxiliary Classifier Generative Adversarial Network (ACGAN) on the MNIST dataset. See https://arxiv.org/abs/1610.09585 for more details.
+You should start to see reasonable images after ~5 epochs, and good images by ~15 epochs. You should use a GPU, as the convolution-heavy operations are very slow on the CPU. Prefer the TensorFlow backend if you plan on iterating, as the compilation time can be a blocker using Theano.
+Hardware | +Backend | +Time / Epoch | +
---|---|---|
CPU | +TF | +3 hrs | +
Titan X (maxwell) | +TF | +4 min | +
Titan X (maxwell) | +TH | +7 min | +
library(keras)
+library(progress)
+library(abind)
+K <- keras::backend()
+K$set_image_data_format('channels_first')
+
+# Functions ---------------------------------------------------------------
+
+build_generator <- function(latent_size){
+
+ # we will map a pair of (z, L), where z is a latent vector and L is a
+ # label drawn from P_c, to image space (..., 1, 28, 28)
+ cnn <- keras_model_sequential()
+
+ cnn %>%
+ layer_dense(1024, input_shape = latent_size, activation = "relu") %>%
+ layer_dense(128*7*7, activation = "relu") %>%
+ layer_reshape(c(128, 7, 7)) %>%
+ # upsample to (..., 14, 14)
+ layer_upsampling_2d(size = c(2, 2)) %>%
+ layer_conv_2d(
+ 256, c(5,5), padding = "same", activation = "relu",
+ kernel_initializer = "glorot_normal"
+ ) %>%
+ # upsample to (..., 28, 28)
+ layer_upsampling_2d(size = c(2, 2)) %>%
+ layer_conv_2d(
+ 128, c(5,5), padding = "same", activation = "tanh",
+ kernel_initializer = "glorot_normal"
+ ) %>%
+ # take a channel axis reduction
+ layer_conv_2d(
+ 1, c(2,2), padding = "same", activation = "tanh",
+ kernel_initializer = "glorot_normal"
+ )
+
+
+ # this is the z space commonly refered to in GAN papers
+ latent <- layer_input(shape = list(latent_size))
+
+ # this will be our label
+ image_class <- layer_input(shape = list(1))
+
+ # 10 classes in MNIST
+ cls <- image_class %>%
+ layer_embedding(
+ input_dim = 10, output_dim = latent_size,
+ embeddings_initializer='glorot_normal'
+ ) %>%
+ layer_flatten()
+
+
+ # hadamard product between z-space and a class conditional embedding
+ h <- layer_multiply(list(latent, cls))
+
+ fake_image <- cnn(h)
+
+ keras_model(list(latent, image_class), fake_image)
+}
+
+build_discriminator <- function(){
+
+ # build a relatively standard conv net, with LeakyReLUs as suggested in
+ # the reference paper
+ cnn <- keras_model_sequential()
+
+ cnn %>%
+ layer_conv_2d(
+ 32, c(3,3), padding = "same", strides = c(2,2),
+ input_shape = c(1, 28, 28)
+ ) %>%
+ layer_activation_leaky_relu() %>%
+ layer_dropout(0.3) %>%
+
+ layer_conv_2d(64, c(3, 3), padding = "same", strides = c(1,1)) %>%
+ layer_activation_leaky_relu() %>%
+ layer_dropout(0.3) %>%
+
+ layer_conv_2d(128, c(3, 3), padding = "same", strides = c(2,2)) %>%
+ layer_activation_leaky_relu() %>%
+ layer_dropout(0.3) %>%
+
+ layer_conv_2d(256, c(3, 3), padding = "same", strides = c(1,1)) %>%
+ layer_activation_leaky_relu() %>%
+ layer_dropout(0.3) %>%
+
+ layer_flatten()
+
+
+
+ image <- layer_input(shape = c(1, 28, 28))
+ features <- cnn(image)
+
+ # first output (name=generation) is whether or not the discriminator
+ # thinks the image that is being shown is fake, and the second output
+ # (name=auxiliary) is the class that the discriminator thinks the image
+ # belongs to.
+ fake <- features %>%
+ layer_dense(1, activation = "sigmoid", name = "generation")
+
+ aux <- features %>%
+ layer_dense(10, activation = "softmax", name = "auxiliary")
+
+ keras_model(image, list(fake, aux))
+}
+
+# Parameters --------------------------------------------------------------
+
+# batch and latent size taken from the paper
+epochs <- 50
+batch_size <- 100
+latent_size <- 100
+
+# Adam parameters suggested in https://arxiv.org/abs/1511.06434
+adam_lr <- 0.00005
+adam_beta_1 <- 0.5
+
+# Model definition --------------------------------------------------------
+
+# build the discriminator
+discriminator <- build_discriminator()
+discriminator %>% compile(
+ optimizer = optimizer_adam(lr = adam_lr, beta_1 = adam_beta_1),
+ loss = list("binary_crossentropy", "sparse_categorical_crossentropy")
+)
+
+# build the generator
+generator <- build_generator(latent_size)
+generator %>% compile(
+ optimizer = optimizer_adam(lr = adam_lr, beta_1 = adam_beta_1),
+ loss = "binary_crossentropy"
+)
+
+latent <- layer_input(shape = list(latent_size))
+image_class <- layer_input(shape = list(1), dtype = "int32")
+
+fake <- generator(list(latent, image_class))
+
+# we only want to be able to train generation for the combined model
+
+discriminator$trainable <- FALSE
+results <- discriminator(fake)
+
+combined <- keras_model(list(latent, image_class), results)
+combined %>% compile(
+ optimizer = optimizer_adam(lr = adam_lr, beta_1 = adam_beta_1),
+ loss = list("binary_crossentropy", "sparse_categorical_crossentropy")
+)
+
+
+# Data preparation --------------------------------------------------------
+
+# get our mnist data, and force it to be of shape (..., 1, 28, 28) with
+# range [-1, 1]
+mnist <- dataset_mnist()
+mnist$train$x <- (mnist$train$x - 127.5)/127.5
+mnist$test$x <- (mnist$test$x - 127.5)/127.5
+dim(mnist$train$x) <- c(60000, 1, 28, 28)
+dim(mnist$test$x) <- c(10000, 1, 28, 28)
+
+num_train <- dim(mnist$train$x)[1]
+num_test <- dim(mnist$test$x)[1]
+
+# Training ----------------------------------------------------------------
+
+for(epoch in 1:epochs){
+
+ num_batches <- trunc(num_train/batch_size)
+ pb <- progress_bar$new(
+ total = num_batches,
+ format = sprintf("epoch %s/%s :elapsed [:bar] :percent :eta", epoch, epochs),
+ clear = FALSE
+ )
+
+ epoch_gen_loss <- NULL
+ epoch_disc_loss <- NULL
+
+ possible_indexes <- 1:num_train
+
+ for(index in 1:num_batches){
+
+ pb$tick()
+
+ # generate a new batch of noise
+ noise <- runif(n = batch_size*latent_size, min = -1, max = 1) %>%
+ matrix(nrow = batch_size, ncol = latent_size)
+
+ # get a batch of real images
+ batch <- sample(possible_indexes, size = batch_size)
+ possible_indexes <- possible_indexes[!possible_indexes %in% batch]
+ image_batch <- mnist$train$x[batch,,,,drop = FALSE]
+ label_batch <- mnist$train$y[batch]
+
+ # sample some labels from p_c
+ sampled_labels <- sample(0:9, batch_size, replace = TRUE) %>%
+ matrix(ncol = 1)
+
+ # generate a batch of fake images, using the generated labels as a
+ # conditioner. We reshape the sampled labels to be
+ # (batch_size, 1) so that we can feed them into the embedding
+ # layer as a length one sequence
+ generated_images <- predict(generator, list(noise, sampled_labels))
+
+ X <- abind(image_batch, generated_images, along = 1)
+ y <- c(rep(1L, batch_size), rep(0L, batch_size)) %>% matrix(ncol = 1)
+ aux_y <- c(label_batch, sampled_labels) %>% matrix(ncol = 1)
+
+ # see if the discriminator can figure itself out...
+ disc_loss <- train_on_batch(
+ discriminator, x = X,
+ y = list(y, aux_y)
+ )
+
+ epoch_disc_loss <- rbind(epoch_disc_loss, unlist(disc_loss))
+
+ # make new noise. we generate 2 * batch size here such that we have
+ # the generator optimize over an identical number of images as the
+ # discriminator
+ noise <- runif(2*batch_size*latent_size, min = -1, max = 1) %>%
+ matrix(nrow = 2*batch_size, ncol = latent_size)
+ sampled_labels <- sample(0:9, size = 2*batch_size, replace = TRUE) %>%
+ matrix(ncol = 1)
+
+ # we want to train the generator to trick the discriminator
+ # For the generator, we want all the {fake, not-fake} labels to say
+ # not-fake
+ trick <- rep(1, 2*batch_size) %>% matrix(ncol = 1)
+
+ combined_loss <- train_on_batch(
+ combined,
+ list(noise, sampled_labels),
+ list(trick, sampled_labels)
+ )
+
+ epoch_gen_loss <- rbind(epoch_gen_loss, unlist(combined_loss))
+
+ }
+
+ cat(sprintf("\nTesting for epoch %02d:", epoch))
+
+ # evaluate the testing loss here
+
+ # generate a new batch of noise
+ noise <- runif(num_test*latent_size, min = -1, max = 1) %>%
+ matrix(nrow = num_test, ncol = latent_size)
+
+ # sample some labels from p_c and generate images from them
+ sampled_labels <- sample(0:9, size = num_test, replace = TRUE) %>%
+ matrix(ncol = 1)
+ generated_images <- predict(generator, list(noise, sampled_labels))
+
+ X <- abind(mnist$test$x, generated_images, along = 1)
+ y <- c(rep(1, num_test), rep(0, num_test)) %>% matrix(ncol = 1)
+ aux_y <- c(mnist$test$y, sampled_labels) %>% matrix(ncol = 1)
+
+ # see if the discriminator can figure itself out...
+ discriminator_test_loss <- evaluate(
+ discriminator, X, list(y, aux_y),
+ verbose = FALSE
+ ) %>% unlist()
+
+ discriminator_train_loss <- apply(epoch_disc_loss, 2, mean)
+
+ # make new noise
+ noise <- runif(2*num_test*latent_size, min = -1, max = 1) %>%
+ matrix(nrow = 2*num_test, ncol = latent_size)
+ sampled_labels <- sample(0:9, size = 2*num_test, replace = TRUE) %>%
+ matrix(ncol = 1)
+
+ trick <- rep(1, 2*num_test) %>% matrix(ncol = 1)
+
+ generator_test_loss = combined %>% evaluate(
+ list(noise, sampled_labels),
+ list(trick, sampled_labels),
+ verbose = FALSE
+ )
+
+ generator_train_loss <- apply(epoch_gen_loss, 2, mean)
+
+
+ # generate an epoch report on performance
+ row_fmt <- "\n%22s : loss %4.2f | %5.2f | %5.2f"
+ cat(sprintf(
+ row_fmt,
+ "generator (train)",
+ generator_train_loss[1],
+ generator_train_loss[2],
+ generator_train_loss[3]
+ ))
+ cat(sprintf(
+ row_fmt,
+ "generator (test)",
+ generator_test_loss[1],
+ generator_test_loss[2],
+ generator_test_loss[3]
+ ))
+
+ cat(sprintf(
+ row_fmt,
+ "discriminator (train)",
+ discriminator_train_loss[1],
+ discriminator_train_loss[2],
+ discriminator_train_loss[3]
+ ))
+ cat(sprintf(
+ row_fmt,
+ "discriminator (test)",
+ discriminator_test_loss[1],
+ discriminator_test_loss[2],
+ discriminator_test_loss[3]
+ ))
+
+ cat("\n")
+
+ # generate some digits to display
+ noise <- runif(10*latent_size, min = -1, max = 1) %>%
+ matrix(nrow = 10, ncol = latent_size)
+
+ sampled_labels <- 0:9 %>%
+ matrix(ncol = 1)
+
+ # get a batch to display
+ generated_images <- predict(
+ generator,
+ list(noise, sampled_labels)
+ )
+
+ img <- NULL
+ for(i in 1:10){
+ img <- cbind(img, generated_images[i,,,])
+ }
+
+ ((img + 1)/2) %>% as.raster() %>%
+ plot()
+
+}
Demonstrates how to write custom layers for Keras.
+We build a custom activation layer called ‘Antirectifier’, which modifies the shape of the tensor that passes through it. We need to specify two methods: compute_output_shape
and call
.
Note that the same result can also be achieved via a Lambda layer.
+library(keras)
+
+batch_size <- 128
+num_classes <- 10
+epochs <- 40
+
+# the data, shuffled and split between train and test sets
+mnist <- dataset_mnist()
+x_train <- mnist$train$x
+y_train <- mnist$train$y
+x_test <- mnist$test$x
+y_test <- mnist$test$y
+
+dim(x_train) <- c(nrow(x_train), 784)
+dim(x_test) <- c(nrow(x_test), 784)
+
+x_train <- x_train / 255
+x_test <- x_test / 255
+
+cat(nrow(x_train), 'train samples\n')
+cat(nrow(x_test), 'test samples\n')
+
+# convert class vectors to binary class matrices
+y_train <- to_categorical(y_train, num_classes)
+y_test <- to_categorical(y_test, num_classes)
This is the combination of a sample-wise L2 normalization with the concatenation of the positive part of the input with the negative part of the input. The result is a tensor of samples that are twice as large as the input samples.
+It can be used in place of a ReLU.
+Input shape: 2D tensor of shape (samples, n)
+Output shape: 2D tensor of shape (samples, 2*n)
+When applying ReLU, assuming that the distribution of the previous output is approximately centered around 0., you are discarding half of your input. This is inefficient.
+Antirectifier allows to return all-positive outputs like ReLU, without discarding any data.
+Tests on MNIST show that Antirectifier allows to train networks with twice less parameters yet with comparable classification accuracy as an equivalent ReLU-based network.
+# Because our custom layer is written with primitives from the Keras backend
+# (`K`), our code can run both on TensorFlow and Theano.
+K <- backend()
+
+# Custom layer class
+AntirectifierLayer <- R6::R6Class("KerasLayer",
+
+ inherit = KerasLayer,
+
+ public = list(
+
+ call = function(x, mask = NULL) {
+ x <- x - K$mean(x, axis = 1L, keepdims = TRUE)
+ x <- K$l2_normalize(x, axis = 1L)
+ pos <- K$relu(x)
+ neg <- K$relu(-x)
+ K$concatenate(c(pos, neg), axis = 1L)
+
+ },
+
+ compute_output_shape = function(input_shape) {
+ input_shape[[2]] <- input_shape[[2]] * 2
+ tuple(input_shape)
+ }
+ )
+)
+
+# create layer wrapper function
+layer_antirectifier <- function(object) {
+ create_layer(AntirectifierLayer, object)
+}
model <- keras_model_sequential()
+model %>%
+ layer_dense(units = 256, input_shape = c(784)) %>%
+ layer_antirectifier() %>%
+ layer_dropout(rate = 0.1) %>%
+ layer_dense(units = 256) %>%
+ layer_antirectifier() %>%
+ layer_dropout(rate = 0.1) %>%
+ layer_dense(units = 10, activation = 'softmax')
+
+# compile the model
+model %>% compile(
+ loss = 'categorical_crossentropy',
+ optimizer = 'rmsprop',
+ metrics = c('accuracy')
+)
+
+# train the model
+model %>% fit(x_train, y_train,
+ batch_size = batch_size,
+ epochs = epochs,
+ verbose = 1,
+ validation_data= list(x_test, y_test)
+)
Trains a simple convnet on the MNIST dataset.
+Gets to 99.25% test accuracy after 12 epochs (there is still a lot of margin for parameter tuning). 16 seconds per epoch on a GRID K520 GPU.
+library(keras)
+
+batch_size <- 128
+num_classes <- 10
+epochs <- 12
+
+# input image dimensions
+img_rows <- 28
+img_cols <- 28
+
+# the data, shuffled and split between train and test sets
+mnist <- dataset_mnist()
+x_train <- mnist$train$x
+y_train <- mnist$train$y
+x_test <- mnist$test$x
+y_test <- mnist$test$y
+
+dim(x_train) <- c(nrow(x_train), img_rows, img_cols, 1)
+dim(x_test) <- c(nrow(x_test), img_rows, img_cols, 1)
+input_shape <- c(img_rows, img_cols, 1)
+
+x_train <- x_train / 255
+x_test <- x_test / 255
+
+cat('x_train_shape:', dim(x_train), '\n')
+cat(nrow(x_train), 'train samples\n')
+cat(nrow(x_test), 'test samples\n')
+
+# convert class vectors to binary class matrices
+y_train <- to_categorical(y_train, num_classes)
+y_test <- to_categorical(y_test, num_classes)
+
+# define model
+model <- keras_model_sequential()
+model %>%
+ layer_conv_2d(filters = 32, kernel_size = c(3,3), activation = 'relu',
+ input_shape = input_shape) %>%
+ layer_conv_2d(filters = 64, kernel_size = c(3,3), activation = 'relu') %>%
+ layer_max_pooling_2d(pool_size = c(2, 2)) %>%
+ layer_dropout(rate = 0.25) %>%
+ layer_flatten() %>%
+ layer_dense(units = 128, activation = 'relu') %>%
+ layer_dropout(rate = 0.5) %>%
+ layer_dense(units = num_classes, activation = 'softmax')
+
+# compile model
+model %>% compile(
+ loss = loss_categorical_crossentropy,
+ optimizer = optimizer_adadelta(),
+ metrics = c('accuracy')
+)
+
+# train and evaluate
+model %>% fit(
+ x_train, y_train,
+ batch_size = batch_size,
+ epochs = epochs,
+ verbose = 1,
+ validation_data = list(x_test, y_test)
+)
+scores <- model %>% evaluate(
+ x_test, y_test, verbose = 0
+)
+
+cat('Test loss:', scores[[1]], '\n')
+cat('Test accuracy:', scores[[2]], '\n')
This is an example of using Hierarchical RNN (HRNN) to classify MNIST digits.
+HRNNs can learn across multiple levels of temporal hiearchy over a complex sequence. Usually, the first recurrent layer of an HRNN encodes a sentence (e.g. of word vectors) into a sentence vector. The second recurrent layer then encodes a sequence of such vectors (encoded by the first layer) into a document vector. This document vector is considered to preserve both the word-level and sentence-level structure of the context.
+References:
+In the below MNIST example the first LSTM layer first encodes every column of pixels of shape (28, 1) to a column vector of shape (128,). The second LSTM layer encodes then these 28 column vectors of shape (28, 128) to a image vector representing the whole image. A final Dense layer is added for prediction.
+After 5 epochs: train acc: 0.9858, val acc: 0.9864
+library(keras)
+
+# Training parameters.
+batch_size <- 32
+num_classes <- 10
+epochs <- 5
+
+# Embedding dimensions.
+row_hidden <- 128
+col_hidden <- 128
+
+# the data, shuffled and split between train and test sets
+mnist <- dataset_mnist()
+x_train <- mnist$train$x
+y_train <- mnist$train$y
+x_test <- mnist$test$x
+y_test <- mnist$test$y
+
+# Reshapes data to 4D for Hierarchical RNN.
+dim(x_train) <- c(nrow(x_train), 28, 28, 1)
+dim(x_test) <- c(nrow(x_test), 28, 28, 1)
+x_train <- x_train / 255
+x_test <- x_test / 255
+
+dim_x_train <- dim(x_train)
+cat('x_train_shape:', dim_x_train)
+cat(nrow(x_train), 'train samples')
+cat(nrow(x_test), 'test samples')
+
+# Converts class vectors to binary class matrices
+y_train <- to_categorical(y_train, num_classes)
+y_test <- to_categorical(y_test, num_classes)
+
+row <- dim_x_train[[2]]
+col <- dim_x_train[[3]]
+pixel <- dim_x_train[[4]]
+
+# Model input (4D)
+input <- layer_input(shape = c(row, col, pixel))
+
+# Encodes a row of pixels using TimeDistributed Wrapper
+encoded_rows <- input %>% time_distributed(layer_lstm(units = row_hidden))
+
+# Encodes columns of encoded rows.
+encoded_columns <- encoded_rows %>% layer_lstm(units = col_hidden)
+
+# Model output
+prediction <- encoded_columns %>%
+ layer_dense(units = num_classes, activation = 'softmax')
+
+# Define and compile model
+model <- keras_model(input, prediction)
+model %>% compile(
+ loss = 'categorical_crossentropy',
+ optimizer = 'rmsprop',
+ metrics = c('accuracy')
+)
+
+# Training
+model %>% fit(
+ x_train, y_train,
+ batch_size = batch_size,
+ epochs = epochs,
+ verbose = 1,
+ validation_data = list(x_test, y_test)
+)
+
+# Evaluation
+scores <- model %>% evaluate(x_test, y_test, verbose = 0)
+cat('Test loss:', scores[[1]], '\n')
+cat('Test accuracy:', scores[[2]], '\n')
This is a reproduction of the IRNN experiment with pixel-by-pixel sequential MNIST in “A Simple Way to Initialize Recurrent Networks of Rectified Linear Units” by Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton
+arxiv:1504.00941v2 [cs.NE] 7 Apr 2015 http://arxiv.org/pdf/1504.00941v2.pdf
+Optimizer is replaced with RMSprop which yields more stable and steady improvement.
+Reaches 0.93 train/test accuracy after 900 epochs (which roughly corresponds to 1687500 steps in the original paper.)
+library(keras)
+
+batch_size <- 32
+num_classes <- 10
+epochs <- 200
+hidden_units <- 100
+
+img_rows <- 28
+img_cols <- 28
+
+learning_rate <- 1e-6
+clip_norm <- 1.0
+
+# the data, shuffled and split between train and test sets
+mnist <- dataset_mnist()
+x_train <- mnist$train$x
+y_train <- mnist$train$y
+x_test <- mnist$test$x
+y_test <- mnist$test$y
+
+dim(x_train) <- c(nrow(x_train), img_rows * img_cols, 1)
+dim(x_test) <- c(nrow(x_test), img_rows * img_cols, 1)
+input_shape <- c(img_rows, img_cols, 1)
+
+x_train <- x_train / 255
+x_test <- x_test / 255
+
+cat('x_train_shape:', dim(x_train), '\n')
+cat(nrow(x_train), 'train samples\n')
+cat(nrow(x_test), 'test samples\n')
+
+# convert class vectors to binary class matrices
+y_train <- to_categorical(y_train, num_classes)
+y_test <- to_categorical(y_test, num_classes)
+
+cat("Evaliate IRNN...\n")
+model <- keras_model_sequential()
+model %>%
+ layer_simple_rnn(units = hidden_units,
+ kernel_initializer = initializer_random_normal(stddev = 0.01),
+ recurrent_initializer = initializer_identity(gain = 1.0),
+ activation = 'relu',
+ input_shape = dim(x_train)[-1]) %>%
+ layer_dense(units = num_classes) %>%
+ layer_activation(activation = 'softmax')
+
+model %>% compile(
+ loss = 'categorical_crossentropy',
+ optimizer = optimizer_rmsprop(lr = learning_rate),
+ metrics = c('accuracy')
+)
+
+model %>% fit(
+ x_train, y_train,
+ batch_size = batch_size,
+ epochs = epochs,
+ verbose = 1,
+ validation_data = list(x_test, y_test)
+)
+
+scores <- model %>% evaluate(x_test, y_test, verbose = 0)
+cat('IRNN test score:', scores[[1]], '\n')
+cat('IRNN test accuracy:', scores[[2]], '\n')
Trains a simple deep NN on the MNIST dataset.
+Gets to 98.40% test accuracy after 20 epochs (there is a lot of margin for parameter tuning). 2 seconds per epoch on a K520 GPU.
+library(keras)
+
+batch_size <- 128
+num_classes <- 10
+epochs <- 30
+
+# the data, shuffled and split between train and test sets
+mnist <- dataset_mnist()
+x_train <- mnist$train$x
+y_train <- mnist$train$y
+x_test <- mnist$test$x
+y_test <- mnist$test$y
+
+dim(x_train) <- c(nrow(x_train), 784)
+dim(x_test) <- c(nrow(x_test), 784)
+
+x_train <- x_train / 255
+x_test <- x_test / 255
+
+cat(nrow(x_train), 'train samples\n')
+cat(nrow(x_test), 'test samples\n')
+
+# convert class vectors to binary class matrices
+y_train <- to_categorical(y_train, num_classes)
+y_test <- to_categorical(y_test, num_classes)
+
+model <- keras_model_sequential()
+model %>%
+ layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>%
+ layer_dropout(rate = 0.4) %>%
+ layer_dense(units = 128, activation = 'relu') %>%
+ layer_dropout(rate = 0.3) %>%
+ layer_dense(units = 10, activation = 'softmax')
+
+summary(model)
+
+model %>% compile(
+ loss = 'categorical_crossentropy',
+ optimizer = optimizer_rmsprop(),
+ metrics = c('accuracy')
+)
+
+history <- model %>% fit(
+ x_train, y_train,
+ batch_size = batch_size,
+ epochs = epochs,
+ verbose = 1,
+ validation_split = 0.2
+)
+
+plot(history)
+
+score <- model %>% evaluate(
+ x_test, y_test,
+ verbose = 0
+)
+
+cat('Test loss:', score[[1]], '\n')
+cat('Test accuracy:', score[[2]], '\n')
library(keras)
library(keras)
library(keras)
Transfer learning toy example:
+library(keras)
+
+now <- Sys.time()
+
+batch_size <- 128
+num_classes <- 5
+epochs <- 5
+
+# input image dimensions
+img_rows <- 28
+img_cols <- 28
+
+# number of convolutional filters to use
+filters <- 32
+
+# size of pooling area for max pooling
+pool_size <- 2
+
+# convolution kernel size
+kernel_size <- c(3, 3)
+
+# input shape
+input_shape <- c(img_rows, img_cols, 1)
+
+# the data, shuffled and split between train and test sets
+data <- dataset_mnist()
+x_train <- data$train$x
+y_train <- data$train$y
+x_test <- data$test$x
+y_test <- data$test$y
+
+# create two datasets one with digits below 5 and one with 5 and above
+x_train_lt5 <- x_train[y_train < 5]
+y_train_lt5 <- y_train[y_train < 5]
+x_test_lt5 <- x_test[y_test < 5]
+y_test_lt5 <- y_test[y_test < 5]
+
+x_train_gte5 <- x_train[y_train >= 5]
+y_train_gte5 <- y_train[y_train >= 5] - 5
+x_test_gte5 <- x_test[y_test >= 5]
+y_test_gte5 <- y_test[y_test >= 5] - 5
+
+# define two groups of layers: feature (convolutions) and classification (dense)
+feature_layers <-
+ layer_conv_2d(filters = filters, kernel_size = kernel_size,
+ input_shape = input_shape) %>%
+ layer_activation(activation = 'relu') %>%
+ layer_conv_2d(filters = filters, kernel_size = kernel_size) %>%
+ layer_activation(activation = 'relu') %>%
+ layer_max_pooling_2d(pool_size = pool_size) %>%
+ layer_dropout(rate = 0.25) %>%
+ layer_flatten()
+
+
+
+# feature_layers = [
+# Conv2D(filters, kernel_size,
+# padding='valid',
+# input_shape=input_shape),
+# Activation('relu'),
+# Conv2D(filters, kernel_size),
+# Activation('relu'),
+# MaxPooling2D(pool_size=pool_size),
+# Dropout(0.25),
+# Flatten(),
+# ]
+#
+# classification_layers = [
+# Dense(128),
+# Activation('relu'),
+# Dropout(0.5),
+# Dense(num_classes),
+# Activation('softmax')
+# ]
Neural style transfer with Keras.
+It is preferable to run this script on a GPU, for speed.
+Example result: https://twitter.com/fchollet/status/686631033085677568
+Style transfer consists in generating an image with the same “content” as a base image, but with the “style” of a different picture (typically artistic).
+This is achieved through the optimization of a loss function that has 3 components: “style loss”, “content loss”, and “total variation loss”:
+The total variation loss imposes local spatial continuity between the pixels of the combination image, giving it visual coherence.
The style loss is where the deep learning keeps in –that one is defined using a deep convolutional neural network. Precisely, it consists in a sum of L2 distances between the Gram matrices of the representations of the base image and the style reference image, extracted from different layers of a convnet (trained on ImageNet). The general idea is to capture color/texture information at different spatial scales (fairly large scales –defined by the depth of the layer considered).
The content loss is a L2 distance between the features of the base image (extracted from a deep layer) and the features of the combination image, keeping the generated image close enough to the original one.
library(keras)
+library(purrr)
+library(R6)
+K <- backend()
+
+# Parameters --------------------------------------------------------------
+
+base_image_path <- "neural-style-base-img.png"
+style_reference_image_path <- "neural-style-style.jpg"
+iterations <- 10
+
+# these are the weights of the different loss components
+total_variation_weight <- 1
+style_weight <- 1
+content_weight <- 0.025
+
+# dimensions of the generated picture.
+img <- image_load(base_image_path)
+width <- img$size[[1]]
+height <- img$size[[2]]
+img_nrows <- 400
+img_ncols <- as.integer(width * img_nrows / height)
+
+
+# Functions ---------------------------------------------------------------
+
+# util function to open, resize and format pictures into appropriate tensors
+preprocess_image <- function(path){
+ img <- image_load(path, target_size = c(img_nrows, img_ncols)) %>%
+ image_to_array()
+ dim(img) <- c(1, dim(img))
+ imagenet_preprocess_input(img)
+}
+
+# util function to convert a tensor into a valid image
+deprocess_image <- function(x){
+ x <- x[1,,,]
+ # Remove zero-center by mean pixel
+ x[,,1] <- x[,,1] + 103.939
+ x[,,2] <- x[,,2] + 116.779
+ x[,,3] <- x[,,3] + 123.68
+ # clip to interval 0, 255
+ x[x > 255] <- 255
+ x[x < 0] <- 0
+ x[] <- as.integer(x)/255
+ x
+}
+
+# Defining the model ------------------------------------------------------
+
+# get tensor representations of our images
+base_image <- K$variable(preprocess_image(base_image_path))
+style_reference_image <- K$variable(preprocess_image(style_reference_image_path))
+
+# this will contain our generated image
+combination_image <- K$placeholder(c(1L, img_nrows, img_ncols, 3L))
+
+# combine the 3 images into a single Keras tensor
+input_tensor <- K$concatenate(list(base_image, style_reference_image,
+ combination_image), axis = 0L)
+
+# build the VGG16 network with our 3 images as input
+# the model will be loaded with pre-trained ImageNet weights
+model <- application_vgg16(input_tensor = input_tensor, weights = "imagenet",
+ include_top = FALSE)
+
+print("Model loaded.")
+
+nms <- map_chr(model$layers, ~.x$name)
+output_dict <- map(model$layers, ~.x$output) %>% set_names(nms)
+
+# compute the neural style loss
+# first we need to define 4 util functions
+
+# the gram matrix of an image tensor (feature-wise outer product)
+
+gram_matrix <- function(x){
+
+ features <- x %>%
+ K$permute_dimensions(pattern = c(2L, 0L, 1L)) %>%
+ K$batch_flatten()
+
+ K$dot(features, K$transpose(features))
+}
+
+# the "style loss" is designed to maintain
+# the style of the reference image in the generated image.
+# It is based on the gram matrices (which capture style) of
+# feature maps from the style reference image
+# and from the generated image
+
+style_loss <- function(style, combination){
+ S <- gram_matrix(style)
+ C <- gram_matrix(combination)
+
+ channels <- 3
+ size <- img_nrows*img_ncols
+
+ K$sum(K$square(S - C)) / (4 * channels^2 * size^2)
+}
+
+# an auxiliary loss function
+# designed to maintain the "content" of the
+# base image in the generated image
+
+content_loss <- function(base, combination){
+ K$sum(K$square(combination - base))
+}
+
+# the 3rd loss function, total variation loss,
+# designed to keep the generated image locally coherent
+
+total_variation_loss <- function(x){
+ y_ij <- x[,0:(img_nrows - 2L), 0:(img_ncols - 2L),]
+ y_i1j <- x[,1:(img_nrows - 1L), 0:(img_ncols - 2L),]
+ y_ij1 <- x[,0:(img_nrows - 2L), 1:(img_ncols - 1L),]
+
+ a <- K$square(y_ij - y_i1j)
+ b <- K$square(y_ij - y_ij1)
+ K$sum(K$pow(a + b, 1.25))
+}
+
+# combine these loss functions into a single scalar
+loss <- K$variable(0.0)
+layer_features <- output_dict$block4_conv2
+base_image_features <- layer_features[0,,,]
+combination_features <- layer_features[2,,,]
+
+loss <- loss + content_weight*content_loss(base_image_features,
+ combination_features)
+
+feature_layers = c('block1_conv1', 'block2_conv1',
+ 'block3_conv1', 'block4_conv1',
+ 'block5_conv1')
+
+for(layer_name in feature_layers){
+ layer_features <- output_dict[[layer_name]]
+ style_reference_features <- layer_features[1,,,]
+ combination_features <- layer_features[2,,,]
+ sl <- style_loss(style_reference_features, combination_features)
+ loss <- loss + ((style_weight / length(feature_layers)) * sl)
+}
+
+loss <- loss + (total_variation_weight * total_variation_loss(combination_image))
+
+# get the gradients of the generated image wrt the loss
+grads <- K$gradients(loss, combination_image)[[1]]
+
+f_outputs <- K$`function`(list(combination_image), list(loss, grads))
+
+eval_loss_and_grads <- function(image){
+ dim(image) <- c(1, img_nrows, img_ncols, 3)
+ outs <- f_outputs(list(image))
+ list(
+ loss_value = outs[[1]],
+ grad_values = as.numeric(outs[[2]])
+ )
+}
+
+# Loss and gradients evaluator.
+#
+# This Evaluator class makes it possible
+# to compute loss and gradients in one pass
+# while retrieving them via two separate functions,
+# "loss" and "grads". This is done because scipy.optimize
+# requires separate functions for loss and gradients,
+# but computing them separately would be inefficient.
+Evaluator <- R6Class(
+ "Evaluator",
+ public = list(
+
+ loss_value = NULL,
+ grad_values = NULL,
+
+ initialize = function() {
+ self$loss_value <- NULL
+ self$grad_values <- NULL
+ },
+
+ loss = function(x){
+ loss_and_grad <- eval_loss_and_grads(x)
+ self$loss_value <- loss_and_grad$loss_value
+ self$grad_values <- loss_and_grad$grad_values
+ self$loss_value
+ },
+
+ grads = function(x){
+ grad_values <- self$grad_values
+ self$loss_value <- NULL
+ self$grad_values <- NULL
+ grad_values
+ }
+
+ )
+)
+
+evaluator <- Evaluator$new()
+
+# run scipy-based optimization (L-BFGS) over the pixels of the generated image
+# so as to minimize the neural style loss
+dms <- c(1, img_nrows, img_ncols, 3)
+x <- array(data = runif(prod(dms), min = 0, max = 255) - 128, dim = dms)
+
+# Run optimization (L-BFGS) over the pixels of the generated image
+# so as to minimize the loss
+for(i in 1:iterations){
+
+ # Run L-BFGS
+ opt <- optim(
+ as.numeric(x), fn = evaluator$loss, gr = evaluator$grads,
+ method = "L-BFGS-B",
+ control = list(maxit = 15)
+ )
+
+ # Print loss value
+ print(opt$value)
+
+ # decode the image
+ image <- x <- opt$par
+ dim(image) <- dms
+
+ # plot
+ im <- deprocess_image(image)
+ plot(as.raster(im))
+
+}
library(keras)
This script loads pre-trained word embeddings (GloVe embeddings) into a frozen Keras Embedding layer, and uses it to train a text classification model on the 20 Newsgroup dataset (classication of newsgroup messages into 20 different categories).
+GloVe embedding data can be found at: http://nlp.stanford.edu/data/glove.6B.zip (source page: http://nlp.stanford.edu/projects/glove/)
+20 Newsgroup data can be found at: http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.html
+IMPORTANT NOTE: This example does yet work correctly. The code executes fine and appears to mimic the Python code upon which it is based however it achieves only half the training accuracy that the Python code does so there is clearly a subtle difference.
+We need to investigate this further before formally adding to the list of examples
+library(keras)
+
+GLOVE_DIR <- 'glove.6B'
+TEXT_DATA_DIR <- '20_newsgroup'
+MAX_SEQUENCE_LENGTH <- 1000
+MAX_NB_WORDS <- 20000
+EMBEDDING_DIM <- 100
+VALIDATION_SPLIT <- 0.2
+
+# download data if necessary
+download_data <- function(data_dir, url_path, data_file) {
+ if (!dir.exists(data_dir)) {
+ download.file(paste0(url_path, data_file), data_file, mode = "wb")
+ if (tools::file_ext(data_file) == "zip")
+ unzip(data_file, exdir = tools::file_path_sans_ext(data_file))
+ else
+ untar(data_file)
+ unlink(data_file)
+ }
+}
+download_data(GLOVE_DIR, 'http://nlp.stanford.edu/data/', 'glove.6B.zip')
+download_data(TEXT_DATA_DIR, "http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/", "news20.tar.gz")
+
+# first, build index mapping words in the embeddings set
+# to their embedding vector
+
+cat('Indexing word vectors.\n')
+
+embeddings_index <- new.env(parent = emptyenv())
+lines <- readLines(file.path(GLOVE_DIR, 'glove.6B.100d.txt'))
+for (line in lines) {
+ values <- strsplit(line, ' ', fixed = TRUE)[[1]]
+ word <- values[[1]]
+ coefs <- as.numeric(values[-1])
+ embeddings_index[[word]] <- coefs
+}
+
+cat(sprintf('Found %s word vectors.\n', length(embeddings_index)))
+
+# second, prepare text samples and their labels
+cat('Processing text dataset\n')
+
+texts <- character() # text samples
+labels <- integer() # label ids
+labels_index <- list() # dictionary: label name to numeric id
+
+for (name in list.files(TEXT_DATA_DIR)) {
+ path <- file.path(TEXT_DATA_DIR, name)
+ if (file_test("-d", path)) {
+ label_id <- length(labels_index)
+ labels_index[[name]] <- label_id
+ for (fname in list.files(path)) {
+ if (grepl("^[0-9]+$", fname)) {
+ fpath <- file.path(path, fname)
+ t <- readLines(fpath, encoding = "latin1")
+ t <- paste(t, collapse = "\n")
+ i <- regexpr(pattern = '\n\n', t, fixed = TRUE)[[1]]
+ if (i != -1L)
+ t <- substring(t, i)
+ texts <- c(texts, t)
+ labels <- c(labels, label_id)
+ }
+ }
+ }
+}
+
+cat(sprintf('Found %s texts.\n', length(texts)))
+
+# finally, vectorize the text samples into a 2D integer tensor
+tokenizer <- text_tokenizer(num_words=MAX_NB_WORDS)
+tokenizer %>% fit_text_tokenizer(texts)
+
+sequences <- texts_to_sequences(tokenizer, texts)
+
+word_index <- tokenizer$word_index
+cat(sprintf('Found %s unique tokens.\n', length(word_index)))
+
+data <- pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)
+labels <- to_categorical(labels)
+
+cat('Shape of data tensor: ', dim(data), '\n')
+cat('Shape of label tensor: ', dim(labels), '\n')
+
+# split the data into a training set and a validation set
+indices <- 1:nrow(data)
+indices <- sample(indices)
+data <- data[indices,]
+labels <- labels[indices,]
+num_validation_samples <- as.integer(VALIDATION_SPLIT * nrow(data))
+
+x_train <- data[-(1:num_validation_samples),]
+y_train <- labels[-(1:num_validation_samples),]
+x_val <- data[1:num_validation_samples,]
+y_val <- labels[1:num_validation_samples,]
+
+cat('Preparing embedding matrix.\n')
+
+# prepare embedding matrix
+num_words <- min(MAX_NB_WORDS, length(word_index))
+prepare_embedding_matrix <- function() {
+ embedding_matrix <- matrix(0L, nrow = num_words, ncol = EMBEDDING_DIM)
+ for (word in names(word_index)) {
+ index <- word_index[[word]]
+ if (index >= MAX_NB_WORDS)
+ next
+ embedding_vector <- embeddings_index[[word]]
+ if (!is.null(embedding_vector)) {
+ # words not found in embedding index will be all-zeros.
+ embedding_matrix[index,] <- embedding_vector
+ }
+ }
+ embedding_matrix
+}
+
+embedding_matrix <- prepare_embedding_matrix()
+
+# load pre-trained word embeddings into an Embedding layer
+# note that we set trainable = False so as to keep the embeddings fixed
+embedding_layer <- layer_embedding(
+ input_dim = num_words,
+ output_dim = EMBEDDING_DIM,
+ weights = list(embedding_matrix),
+ input_length = MAX_SEQUENCE_LENGTH,
+ trainable = FALSE
+)
+
+cat('Training model\n')
+
+# train a 1D convnet with global maxpooling
+sequence_input <- layer_input(shape = list(MAX_SEQUENCE_LENGTH), dtype='int32')
+
+preds <- sequence_input %>%
+ embedding_layer %>%
+ layer_conv_1d(filters = 128, kernel_size = 5, activation = 'relu') %>%
+ layer_max_pooling_1d(pool_size = 5) %>%
+ layer_conv_1d(filters = 128, kernel_size = 5, activation = 'relu') %>%
+ layer_max_pooling_1d(pool_size = 5) %>%
+ layer_conv_1d(filters = 128, kernel_size = 5, activation = 'relu') %>%
+ layer_max_pooling_1d(pool_size = 35) %>%
+ layer_flatten() %>%
+ layer_dense(units = 128, activation = 'relu') %>%
+ layer_dense(units = length(labels_index), activation = 'softmax')
+
+
+model <- keras_model(sequence_input, preds)
+
+model %>% compile(
+ loss = 'categorical_crossentropy',
+ optimizer = 'rmsprop',
+ metrics = c('acc')
+)
+
+model %>% fit(
+ x_train, y_train,
+ batch_size = 128,
+ epochs = 10,
+ validation_data = list(x_val, y_val)
+)
Train and evaluate a simple MLP on the Reuters newswire topic classification task.
+library(keras)
+
+max_words <- 1000
+batch_size <- 32
+epochs <- 5
+
+cat('Loading data...\n')
+reuters <- dataset_reuters(num_words = max_words, test_split = 0.2)
+x_train <- reuters$train$x
+y_train <- reuters$train$y
+x_test <- reuters$test$x
+y_test <- reuters$test$y
+
+cat(length(x_train), 'train sequences\n')
+cat(length(x_test), 'test sequences\n')
+
+num_classes <- max(y_train) + 1
+cat(num_classes, '\n')
+
+cat('Vectorizing sequence data...\n')
+
+tokenizer <- text_tokenizer(num_words = max_words)
+x_train <- sequences_to_matrix(tokenizer, x_train, mode = 'binary')
+x_test <- sequences_to_matrix(tokenizer, x_test, mode = 'binary')
+
+cat('x_train shape:', dim(x_train), '\n')
+cat('x_test shape:', dim(x_test), '\n')
+
+cat('Convert class vector to binary class matrix',
+ '(for use with categorical_crossentropy)\n')
+y_train <- to_categorical(y_train, num_classes)
+y_test <- to_categorical(y_test, num_classes)
+cat('y_train shape:', dim(y_train), '\n')
+cat('y_test shape:', dim(y_test), '\n')
+
+cat('Building model...\n')
+model <- keras_model_sequential()
+model %>%
+ layer_dense(units = 512, input_shape = c(max_words)) %>%
+ layer_activation(activation = 'relu') %>%
+ layer_dropout(rate = 0.5) %>%
+ layer_dense(units = num_classes) %>%
+ layer_activation(activation = 'softmax')
+
+model %>% compile(
+ loss = 'categorical_crossentropy',
+ optimizer = 'adam',
+ metrics = c('accuracy')
+)
+
+history <- model %>% fit(
+ x_train, y_train,
+ batch_size = batch_size,
+ epochs = epochs,
+ verbose = 1,
+ validation_split = 0.1
+)
+
+score <- model %>% evaluate(
+ x_test, y_test,
+ batch_size = batch_size,
+ verbose = 1
+)
+
+cat('Test score:', score[[1]], '\n')
+cat('Test accuracy', score[[2]], '\n')
library(keras)
Example script showing how to use stateful RNNs to model long sequences efficiently.
+library(keras)
+
+# since we are using stateful rnn tsteps can be set to 1
+tsteps <- 1
+batch_size <- 25
+epochs <- 25
+# number of elements ahead that are used to make the prediction
+lahead <- 1
+
+# Generates an absolute cosine time series with the amplitude exponentially decreasing
+# Arguments:
+# amp: amplitude of the cosine function
+# period: period of the cosine function
+# x0: initial x of the time series
+# xn: final x of the time series
+# step: step of the time series discretization
+# k: exponential rate
+gen_cosine_amp <- function(amp = 100, period = 1000, x0 = 0, xn = 50000, step = 1, k = 0.0001) {
+ n <- (xn-x0) * step
+ cos <- array(data = numeric(n), dim = c(n, 1, 1))
+ for (i in 1:length(cos)) {
+ idx <- x0 + i * step
+ cos[[i, 1, 1]] <- amp * cos(2 * pi * idx / period)
+ cos[[i, 1, 1]] <- cos[[i, 1, 1]] * exp(-k * idx)
+ }
+ cos
+}
+
+cat('Generating Data...\n')
+cos <- gen_cosine_amp()
+cat('Input shape:', dim(cos), '\n')
+
+expected_output <- array(data = numeric(length(cos)), dim = c(length(cos), 1))
+for (i in 1:(length(cos) - lahead)) {
+ expected_output[[i, 1]] <- mean(cos[(i + 1):(i + lahead)])
+}
+
+cat('Output shape:', dim(expected_output), '\n')
+
+cat('Creating model:\n')
+model <- keras_model_sequential()
+model %>%
+ layer_lstm(units = 50, input_shape = c(tsteps, 1), batch_size = batch_size,
+ return_sequences = TRUE, stateful = TRUE) %>%
+ layer_lstm(units = 50, return_sequences = FALSE, stateful = TRUE) %>%
+ layer_dense(units = 1)
+model %>% compile(loss = 'mse', optimizer = 'rmsprop')
+
+cat('Training\n')
+for (i in 1:epochs) {
+ model %>% fit(cos, expected_output, batch_size = batch_size,
+ epochs = 1, verbose = 1, shuffle = FALSE)
+
+ model %>% reset_states()
+}
+
+cat('Predicting\n')
+predicted_output <- model %>% predict(cos, batch_size = batch_size)
+
+cat('Plotting Results\n')
+op <- par(mfrow=c(2,1))
+plot(expected_output, xlab = '')
+title("Expected")
+plot(predicted_output, xlab = '')
+title("Predicted")
+par(op)
This script demonstrates how to build a variational autoencoder with Keras. Reference: “Auto-Encoding Variational Bayes” https://arxiv.org/abs/1312.6114
+library(keras)
+K <- keras::backend()
+
+# Parameters --------------------------------------------------------------
+
+batch_size <- 100L
+original_dim <- 784L
+latent_dim <- 2L
+intermediate_dim <- 256L
+epochs <- 50L
+epsilon_std <- 1.0
+
+# Model definition --------------------------------------------------------
+
+x <- layer_input(batch_shape = c(batch_size, original_dim))
+h <- layer_dense(x, intermediate_dim, activation = "relu")
+z_mean <- layer_dense(h, latent_dim)
+z_log_var <- layer_dense(h, latent_dim)
+
+sampling <- function(arg){
+ z_mean <- arg[,0:1]
+ z_log_var <- arg[,2:3]
+
+ epsilon <- K$random_normal(
+ shape = c(batch_size, latent_dim),
+ mean=0.,
+ stddev=epsilon_std
+ )
+
+ z_mean + K$exp(z_log_var/2)*epsilon
+}
+
+# note that "output_shape" isn't necessary with the TensorFlow backend
+z <- layer_concatenate(list(z_mean, z_log_var)) %>%
+ layer_lambda(sampling)
+
+# we instantiate these layers separately so as to reuse them later
+decoder_h <- layer_dense(units = intermediate_dim, activation = "relu")
+decoder_mean <- layer_dense(units = original_dim, activation = "sigmoid")
+h_decoded <- decoder_h(z)
+x_decoded_mean <- decoder_mean(h_decoded)
+
+# end-to-end autoencoder
+vae <- keras_model(x, x_decoded_mean)
+
+# encoder, from inputs to latent space
+encoder <- keras_model(x, z_mean)
+
+# generator, from latent space to reconstructed inputs
+decoder_input <- layer_input(shape = latent_dim)
+h_decoded_2 <- decoder_h(decoder_input)
+x_decoded_mean_2 <- decoder_mean(h_decoded_2)
+generator <- keras_model(decoder_input, x_decoded_mean_2)
+
+
+vae_loss <- function(x, x_decoded_mean){
+ xent_loss <- (original_dim/1.0)*loss_binary_crossentropy(x, x_decoded_mean)
+ kl_loss <- -0.5*K$mean(1 + z_log_var - K$square(z_mean) - K$exp(z_log_var), axis = -1L)
+ xent_loss + kl_loss
+}
+
+vae %>% compile(optimizer = "rmsprop", loss = vae_loss)
+
+
+# Data preparation --------------------------------------------------------
+
+mnist <- dataset_mnist()
+x_train <- mnist$train$x/255
+x_test <- mnist$test$x/255
+x_train <- x_train %>% apply(1, as.numeric) %>% t()
+x_test <- x_test %>% apply(1, as.numeric) %>% t()
+
+
+# Model training ----------------------------------------------------------
+
+vae %>% fit(
+ x_train, x_train,
+ shuffle = TRUE,
+ epochs = epochs,
+ batch_size = batch_size,
+ validation_data = list(x_test, x_test)
+)
+
+
+# Visualizations ----------------------------------------------------------
+
+library(ggplot2)
+library(dplyr)
+x_test_encoded <- predict(encoder, x_test, batch_size = batch_size)
+
+x_test_encoded %>%
+ as_data_frame() %>%
+ mutate(class = as.factor(mnist$test$y)) %>%
+ ggplot(aes(x = V1, y = V2, colour = class)) + geom_point()
+
+# display a 2D manifold of the digits
+n <- 15 # figure with 15x15 digits
+digit_size <- 28
+
+# we will sample n points within [-4, 4] standard deviations
+grid_x <- seq(-4, 4, length.out = n)
+grid_y <- seq(-4, 4, length.out = n)
+
+rows <- NULL
+for(i in 1:length(grid_x)){
+ column <- NULL
+ for(j in 1:length(grid_y)){
+ z_sample <- matrix(c(grid_x[i], grid_y[j]), ncol = 2)
+ column <- rbind(column, predict(generator, z_sample) %>% matrix(ncol = 28) )
+ }
+ rows <- cbind(rows, column)
+}
+rows %>% as.raster() %>% plot()
This script demonstrates how to build a variational autoencoder with Keras and deconvolution layers. Reference: “Auto-Encoding Variational Bayes” https://arxiv.org/abs/1312.6114
+library(keras)
+K <- keras::backend()
+
+#### Parameterization ####
+
+# input image dimensions
+img_rows <- 28L
+img_cols <- 28L
+# color channels (1 = grayscale, 3 = RGB)
+img_chns <- 1L
+
+# number of convolutional filters to use
+filters <- 64L
+
+# convolution kernel size
+num_conv <- 3L
+
+latent_dim <- 2L
+intermediate_dim <- 128L
+epsilon_std <- 1.0
+
+# training parameters
+batch_size <- 100L
+epochs <- 5L
+
+
+#### Model Construction ####
+
+original_img_size <- c(img_rows, img_cols, img_chns)
+
+x <- layer_input(batch_shape = c(batch_size, original_img_size))
+
+conv_1 <- layer_conv_2d(
+ x,
+ filters = img_chns,
+ kernel_size = c(2L, 2L),
+ strides = c(1L, 1L),
+ padding = "same",
+ activation = "relu"
+)
+
+conv_2 <- layer_conv_2d(
+ conv_1,
+ filters = filters,
+ kernel_size = c(2L, 2L),
+ strides = c(2L, 2L),
+ padding = "same",
+ activation = "relu"
+)
+
+conv_3 <- layer_conv_2d(
+ conv_2,
+ filters = filters,
+ kernel_size = c(num_conv, num_conv),
+ strides = c(1L, 1L),
+ padding = "same",
+ activation = "relu"
+)
+
+conv_4 <- layer_conv_2d(
+ conv_3,
+ filters = filters,
+ kernel_size = c(num_conv, num_conv),
+ strides = c(1L, 1L),
+ padding = "same",
+ activation = "relu"
+)
+
+flat <- layer_flatten(conv_4)
+hidden <- layer_dense(flat, units = intermediate_dim, activation = "relu")
+
+z_mean <- layer_dense(hidden, units = latent_dim)
+z_log_var <- layer_dense(hidden, units = latent_dim)
+
+sampling <- function(args) {
+ z_mean <- args[, 0:(latent_dim - 1)]
+ z_log_var <- args[, latent_dim:(2 * latent_dim - 1)]
+
+ epsilon <- K$random_normal(
+ shape = c(batch_size, latent_dim),
+ mean = 0.,
+ stddev = epsilon_std
+ )
+ z_mean + K$exp(z_log_var) * epsilon
+}
+
+z <- layer_concatenate(list(z_mean, z_log_var)) %>% layer_lambda(sampling)
+
+output_shape <- c(batch_size, 14L, 14L, filters)
+
+decoder_hidden <- layer_dense(units = intermediate_dim, activation = "relu")
+decoder_upsample <- layer_dense(units = prod(output_shape[-1]), activation = "relu")
+
+decoder_reshape <- layer_reshape(target_shape = output_shape[-1])
+decoder_deconv_1 <- layer_conv_2d_transpose(
+ filters = filters,
+ kernel_size = c(num_conv, num_conv),
+ strides = c(1L, 1L),
+ padding = "same",
+ activation = "relu"
+)
+
+decoder_deconv_2 <- layer_conv_2d_transpose(
+ filters = filters,
+ kernel_size = c(num_conv, num_conv),
+ strides = c(1L, 1L),
+ padding = "same",
+ activation = "relu"
+)
+
+decoder_deconv_3_upsample <- layer_conv_2d_transpose(
+ filters = filters,
+ kernel_size = c(3L, 3L),
+ strides = c(2L, 2L),
+ padding = "valid",
+ activation = "relu"
+)
+
+decoder_mean_squash <- layer_conv_2d(
+ filters = img_chns,
+ kernel_size = c(2L, 2L),
+ strides = c(1L, 1L),
+ padding = "valid",
+ activation = "sigmoid"
+)
+
+hidden_decoded <- decoder_hidden(z)
+up_decoded <- decoder_upsample(hidden_decoded)
+reshape_decoded <- decoder_reshape(up_decoded)
+deconv_1_decoded <- decoder_deconv_1(reshape_decoded)
+deconv_2_decoded <- decoder_deconv_2(deconv_1_decoded)
+x_decoded_relu <- decoder_deconv_3_upsample(deconv_2_decoded)
+x_decoded_mean_squash <- decoder_mean_squash(x_decoded_relu)
+
+# custom loss function
+vae_loss <- function(x, x_decoded_mean_squash) {
+ x <- K$flatten(x)
+ x_decoded_mean_squash <- K$flatten(x_decoded_mean_squash)
+ xent_loss <- 1.0 * img_rows * img_cols *
+ loss_binary_crossentropy(x, x_decoded_mean_squash)
+ kl_loss <- -0.5 * K$mean(1 + z_log_var - K$square(z_mean) -
+ K$exp(z_log_var), axis = -1L)
+ K$mean(xent_loss + kl_loss)
+}
+
+## variational autoencoder
+vae <- keras_model(x, x_decoded_mean_squash)
+vae %>% compile(optimizer = "rmsprop", loss = vae_loss)
+summary(vae)
+
+## encoder: model to project inputs on the latent space
+encoder <- keras_model(x, z_mean)
+
+## build a digit generator that can sample from the learned distribution
+gen_decoder_input <- layer_input(shape = latent_dim)
+gen_hidden_decoded <- decoder_hidden(gen_decoder_input)
+gen_up_decoded <- decoder_upsample(gen_hidden_decoded)
+gen_reshape_decoded <- decoder_reshape(gen_up_decoded)
+gen_deconv_1_decoded <- decoder_deconv_1(gen_reshape_decoded)
+gen_deconv_2_decoded <- decoder_deconv_2(gen_deconv_1_decoded)
+gen_x_decoded_relu <- decoder_deconv_3_upsample(gen_deconv_2_decoded)
+gen_x_decoded_mean_squash <- decoder_mean_squash(gen_x_decoded_relu)
+generator <- keras_model(gen_decoder_input, gen_x_decoded_mean_squash)
+
+
+#### Data Preparation ####
+
+mnist <- dataset_mnist()
+data <- lapply(mnist, function(m) {
+ array(m$x / 255, dim = c(dim(m$x)[1], original_img_size))
+})
+x_train <- data$train
+x_test <- data$test
+
+
+#### Model Fitting ####
+
+vae %>% fit(
+ x_train, x_train,
+ shuffle = TRUE,
+ epochs = epochs,
+ batch_size = batch_size,
+ validation_data = list(x_test, x_test)
+)
+
+
+#### Visualizations ####
+
+library(ggplot2)
+library(dplyr)
+
+## display a 2D plot of the digit classes in the latent space
+x_test_encoded <- predict(encoder, x_test, batch_size = batch_size)
+x_test_encoded %>%
+ as_data_frame() %>%
+ mutate(class = as.factor(mnist$test$y)) %>%
+ ggplot(aes(x = V1, y = V2, colour = class)) + geom_point()
+
+## display a 2D manifold of the digits
+n <- 15 # figure with 15x15 digits
+digit_size <- 28
+
+# we will sample n points within [-4, 4] standard deviations
+grid_x <- seq(-4, 4, length.out = n)
+grid_y <- seq(-4, 4, length.out = n)
+
+rows <- NULL
+for(i in 1:length(grid_x)){
+ column <- NULL
+ for(j in 1:length(grid_y)){
+ z_sample <- matrix(c(grid_x[i], grid_y[j]), ncol = 2)
+ column <- rbind(column, predict(generator, z_sample) %>% matrix(ncol = digit_size))
+ }
+ rows <- cbind(rows, column)
+}
+rows %>% as.raster() %>% plot()
Please cite Keras in your publications if it helps your research. Here is an example BibTeX entry:
+@misc{chollet2015keras,
+ title={Keras},
+ author={Chollet, Fran\c{c}ois and others},
+ year={2015},
+ publisher={GitHub},
+ howpublished={\url{https://github.com/fchollet/keras}},
+}
+Below are some common definitions that are necessary to know and understand to correctly utilize Keras:
+evaluation_data
or evaluation_split
with the fit
method of Keras models, evaluation will be run at the end of every epoch.Unlike most R objects, Keras objects are “mutable”. That means that when you modify an object you’re modifying it “in place”, and you don’t need to assign the updated object back to the original name. For example, to add layers to a Keras model you might use this code:
+model %>%
+ layer_dense(units = 32, activation = 'relu', input_shape = c(784)) %>%
+ layer_dense(units = 10, activation = 'softmax')
Rather than this code:
+model <- model %>%
+ layer_dense(units = 32, activation = 'relu', input_shape = c(784)) %>%
+ layer_dense(units = 10, activation = 'softmax')
You need to be aware of this because it makes the Keras API a little different than most other pipelines you may have used, but it’s necessary to match the data structures and behavior of the underlying Keras library.
+You can use save_model_hdf5()
to save a Keras model into a single HDF5 file which will contain:
You can then use load_model_hdf5()
to reinstantiate your model. load_model_hdf5()
will also take care of compiling the model using the saved training configuration (unless the model was never compiled in the first place).
Example:
+save_model_hdf5(model, 'my_model.h5')
+model <- load_model_hdf5('my_model.h5')
If you only need to save the architecture of a model, and not its weights or its training configuration, you can do:
+json_string <- model_to_json(model)
+yaml_string <- model_to_yaml(model)
The generated JSON / YAML files are human-readable and can be manually edited if needed.
+You can then build a fresh model from this data:
+model <- model_from_json(json_string)
+model <- model_from_yaml(yaml_string)
If you need to save the weights of a model, you can do so in HDF5 with the code below.
+save_model_weights_hdf5('my_model_weights.h5')
Assuming you have code for instantiating your model, you can then load the weights you saved into a model with the same architecture:
+model %>% load_model_weights_hdf5('my_model_weights.h5')
If you need to load weights into a different architecture (with some layers in common), for instance for fine-tuning or transfer-learning, you can load weights by layer name:
+model %>% load_model_weights_hdf5('my_model_weights.h5', by_name = TRUE)
For example:
+# assuming the original model looks like this:
+# model <- keras_model_sequential()
+# model %>%
+# layer_dense(units = 2, input_dim = 3, name = "dense 1") %>%
+# layer_dense(units = 3, name = "dense_3") %>%
+# ...
+# save_model_weights(model, fname)
+
+# new model
+model <- keras_model_sequential()
+model %>%
+ layer_dense(units = 2, input_dim = 3, name = "dense 1") %>% # will be loaded
+ layer_dense(units = 3, name = "dense_3") # will not be loaded
+
+# load weights from first model; will only affect the first layer, dense_1.
+load_model_weights(fname, by_name = TRUE)
A Keras model has two modes: training and testing. Regularization mechanisms, such as Dropout and L1/L2 weight regularization, are turned off at testing time.
+Besides, the training loss is the average of the losses over each batch of training data. Because your model is changing over time, the loss over the first batches of an epoch is generally higher than over the last batches. On the other hand, the testing loss for an epoch is computed using the model as it is at the end of the epoch, resulting in a lower loss.
+One simple way is to create a new Model
that will output the layers that you are interested in:
model <- ... # create the original model
+
+layer_name <- 'my_layer'
+intermediate_layer_model <- keras_model(inputs = model$input,
+ outputs = get_layer(layer_name)$output)
+intermediate_output <- predict(intermediate_layer_model, data)
To provide training or evaluation data incrementally you can write an R generator function that yields batches of training data then pass the function to the fit_generator()
function (or related functions evaluate_generator()
and predict_generator()
.
The output of generator functions must be a list of one of these forms:
+All arrays should contain the same number of samples. The generator is expected to loop over its data indefinitely. For example, here’s simple generator function that yields randomly sampled batches of data:
+sampling_generator <- function(X_data, Y_data, batch_size) {
+ function() {
+ rows <- sample(1:nrow(X_data), batch_size, replace = TRUE)
+ list(X_data[rows,], Y_data[rows,])
+ }
+}
+
+model %>%
+ fit_generator(sampling_generator(X_train, Y_train, batch_size = 128),
+ steps_per_epoch = nrow(X_train) / 128, epochs = 10)
The steps_per_epoch
parameter indicates the number of steps (batches of samples) to yield from generator before declaring one epoch finished and starting the next epoch. It should typically be equal to the number of unique samples if your dataset divided by the batch size.
The above example doesn’t however address the use case of datasets that don’t fit in memory. Typically to do that you’ll write a generator that reads from another source (e.g. a sparse matrix or file(s) on disk) and maintains an offset into that data as it’s called repeatedly. For example, imagine you have a set of text files in a directory you want to read from:
+data_files_generator <- function(dir) {
+
+ files < list.files(dir)
+ next_file <- 0
+
+ function() {
+
+ # move to the next file (note the <<- assignment operator)
+ next_file <<- next_file + 1
+
+ # determine the file name
+ file <- files[[next_file]]
+
+ # process and return the data in the file
+ file_to_training_data(file)
+ }
+}
The above function is an example of a stateful generator—the function maintains information across calls to keep track of which data to provide next. This is accomplished by defining shared state outside the generator function body and using the <<-
operator to assign to it from within the generator.
You can also use the flow_images_from_directory()
and flow_images_from_data()
functions along with fit_generator()
for training on sets of images stored on disk (with optional image augmentation/normalization via image_data_generator()
).
You can see batch image training in action in our CIFAR10 example.
+You can also do batch training using the train_on_batch()
and test_on_batch()
functions. These functions enable you to write a training loop that reads into memory only the data required for each batch.
You can use an early stopping callback:
+early_stopping <- callback_early_stopping(monitor = 'val_loss', patience = 2)
+model %>% fit(X, y, validation_split = 0.2, callbacks = c(early_stopping))
Find out more in the callbacks documentation.
+If you set the validation_split
argument in fit
to e.g. 0.1, then the validation data used will be the last 10% of the data. If you set it to 0.25, it will be the last 25% of the data, etc. Note that the data isn’t shuffled before extracting the validation split, so the validation is literally just the last x% of samples in the input you passed.
The same validation set is used for all epochs (within a same call to fit
).
Yes, if the shuffle
argument in fit
is set to TRUE
(which is the default), the training data will be randomly shuffled at each epoch.
Validation data is never shuffled.
+The model.fit
method returns an History
callback, which has a history
attribute containing the lists of successive losses and other metrics.
hist <- model %>% fit(X, y, validation_split=0.2)
+hist$history
To “freeze” a layer means to exclude it from training, i.e. its weights will never be updated. This is useful in the context of fine-tuning a model, or using fixed embeddings for a text input.
+You can pass a trainable
argument (boolean) to a layer constructor to set a layer to be non-trainable:
frozen_layer <- layer_dense(units = 32, trainable = FALSE)
Additionally, you can set the trainable
property of a layer to TRUE
or FALSE
after instantiation. For this to take effect, you will need to call compile()
on your model after modifying the trainable
property. Here’s an example:
x <- layer_input(shape = c(32))
+layer <- layer_dense(units = 32)
+layer$trainable <- FALSE
+y <- x %>% layer
+
+frozen_model <- keras_model(x, y)
+# in the model below, the weights of `layer` will not be updated during training
+frozen_model %>% compile(optimizer = 'rmsprop', loss = 'mse')
+
+layer$trainable <- TRUE
+trainable_model <- keras_model(x, y)
+# with this model the weights of the layer will be updated during training
+# (which will also affect the above model since it uses the same layer instance)
+trainable_model %>% compile(optimizer = 'rmsprop', loss = 'mse')
+
+frozen_model %>% fit(data, labels) # this does NOT update the weights of `layer`
+trainable_model %>% fit(data, labels) # this updates the weights of `layer`
Making a RNN stateful means that the states for the samples of each batch will be reused as initial states for the samples in the next batch.
+When using stateful RNNs, it is therefore assumed that:
+X1
and X2
are successive batches of samples, then X2[[i]]
is the follow-up sequence to X1[[i]
, for every i
.To use statefulness in RNNs, you need to:
+batch_size
argument to the first layer in your model. E.g. batch_size=32
for a 32-samples batch of sequences of 10 timesteps with 16 features per timestep.stateful=TRUE
in your RNN layer(s).shuffle=FALSE
when calling fit().To reset the states accumulated in either a singel layer or an entire model use the reset_states()
function.
Notes that the methods predict()
, fit()
, train_on_batch()
, predict_classes()
, etc. will all update the states of the stateful layers in a model. This allows you to do not only stateful training, but also stateful prediction.
You can remove the last added layer in a Sequential model by calling pop_layer()
:
model <- keras_model_sequential()
+model %>%
+ layer_dense(units = 32, activation = 'relu', input_shape = c(784)) %>%
+ layer_dense(units = 32, activation = 'relu') %>%
+ layer_dense(units = 32, activation = 'relu')
+
+length(model$layers) # "3"
+model %>% pop_layer()
+length(model$layers) # "2"
Code and pre-trained weights are available for the following image classification models:
+ +For example:
+model <- application_vgg16(weights = 'imagenet', include_top = TRUE)
For a few simple usage examples, see the documentation for the Applications module.
+The VGG16 model is also the basis for the Deep dream Keras example script.
+By default the Keras Python and R packages use the TensorFlow backend. Other available backends include Theano or CNTK. To learn more about using alternatate backends (e.g. Theano or CNTK) see the article on Keras backends.
+Note that installation and configuration of the GPU-based backends can take considerably more time and effort. So if you are just getting started with Keras you may want to stick with the CPU version initially, then install the appropriate GPU version once your training becomes more computationally demanding.
+Below are instructions for installing and enabling GPU support for the various supported backends.
+If your system has an NVIDIA® GPU and you have the GPU version of TensorFlow installed then your Keras code will automatically run on the GPU.
+Additional details on GPU installation can be found here: https://tensorflow.rstudio.com/installation_gpu.html.
+If you are running on the Theano backend, you can set the THEANO_FLAGS
environment variable to indicate you’d like to execute tensor operations on the GPU. For example:
Sys.setenv(KERAS_BACKEND = "keras")
+Sys.setenv(THEANO_FLAGS = "device=gpu,floatX=float32")
+library(keras)
The name ‘gpu’ might have to be changed depending on your device’s identifier (e.g. gpu0
, gpu1
, etc).
If you have the GPU version of CNTK installed then your Keras code will automatically run on the GPU.
+Additional information on installing the GPU version of CNTK can be found here: https://docs.microsoft.com/en-us/cognitive-toolkit/setup-linux-python
+The main consideration in using Keras within another R package is to ensure that your package can be tested in an environment where Keras is not available (e.g. the CRAN test servers). To do this, arrange for your tests to be skipped when Keras isn’t available using the is_keras_available()
function.
For example, here’s a testthat utility function that can be used to skip a test when Keras isn’t available:
+# testthat utilty for skipping tests when Keras isn't available
+skip_if_no_keras <- function(version = NULL) {
+ if (!is_keras_available(version))
+ skip("Required keras version not available for testing")
+}
+
+# use the function within a test
+test_that("keras function works correctly", {
+ skip_if_no_keras()
+ # test code here
+})
You can pass the version
argument to check for a specific version of Keras.
Another consideration is gaining access to the underlying Keras python module. You might need to do this if you require lower level access to Keras than is provided for by the Keras R package.
+Since the Keras R package can bind to multiple different implementations of Keras (either the original Keras or the TensorFlow implementation of Keras), you should use the keras::implementation()
function to obtain access to the correct python module. You can use this function within the .onLoad
function of a package to provide global access to the module within your package. For example:
# Keras python module
+keras <- NULL
+
+# Obtain a reference to the module from the keras R package
+.onLoad <- function(libname, pkgname) {
+ keras <<- keras::implementation()
+}
If you create custom layers in R or import other Python packages which include custom Keras layers, be sure to wrap them using the create_layer()
function so that they are composable using the magrittr pipe operator. See the documentation on layer wrapper functions for additional details.
During development of a model, sometimes it is useful to be able to obtain reproducible results from run to run in order to determine if a change in performance is due to an actual model or data modification, or merely a result of a new random sample.
+The below snippet of code provides an example of how to obtain reproducible results when using the TensorFlow backend. To do this we set the R session’s random seed, then manually construct a TensorFlow session (via the tensorflow package) and set it’s random seed, and then finally arrange for Keras to use this session within its backend.
+library(keras)
+library(tensorflow)
+
+# Set R random seed
+set.seed(42L)
+
+# TensorFlow session configuration that uses only a single thread. Multiple threads are a
+# potential source of non-reproducible results, see: https://stackoverflow.com/questions/42022950/which-seeds-have-to-be-set-where-to-realize-100-reproducibility-of-training-res
+session_conf <- tf$ConfigProto(intra_op_parallelism_threads = 1L,
+ inter_op_parallelism_threads = 1L)
+
+# Set TF random seed (see: https://www.tensorflow.org/api_docs/python/tf/set_random_seed)
+tf$set_random_seed(1042L)
+
+# Create the session using the custom configuration
+sess <- tf$Session(graph = tf$get_default_graph(), config = session_conf)
+
+# Instruct Keras to use this session
+K <- backend()
+K$set_session(sess)
+
+# Rest of code follows ...
The default directory where all Keras data is stored is:
+$HOME/.keras/
Note that Windows users should replace $HOME
with %USERPROFILE%
. In case Keras cannot create the above directory (e.g. due to permission issues), /tmp/.keras/
is used as a backup.
The Keras configuration file is a JSON file stored at $HOME/.keras/keras.json
. The default configuration file looks like this:
{
+ "image_data_format": "channels_last",
+ "epsilon": 1e-07,
+ "floatx": "float32",
+ "backend": "tensorflow"
+}
+It contains the following fields:
+channels_last
or channels_first
).epsilon
numerical fuzz factor to be used to prevent division by zero in some operations.Likewise, cached dataset files, such as those downloaded with get_file()
, are stored by default in $HOME/.keras/datasets/
.
The Keras functional API is the way to go for defining complex models, such as multi-output models, directed acyclic graphs, or models with shared layers.
+This guide assumes that you are already familiar with the Sequential model.
+Let’s start with something simple.
+The Sequential model is probably a better choice to implement such a network, but it helps to start with something really simple.
+To use the functional API, build your input and output layers and then pass them to the model()
function. This model can be trained just like Keras sequential models.
library(keras)
+
+# input layer
+inputs <- layer_input(shape = c(784))
+
+# outputs compose input + dense layers
+predictions <- inputs %>%
+ layer_dense(units = 64, activation = 'relu') %>%
+ layer_dense(units = 64, activation = 'relu') %>%
+ layer_dense(units = 10, activation = 'softmax')
+
+# create and compile model
+model <- keras_model(inputs = inputs, outputs = predictions)
+model %>% compile(
+ optimizer = 'rmsprop',
+ loss = 'categorical_crossentropy',
+ metrics = c('accuracy')
+)
Note that Keras objects are modified in place which is why it’s not necessary for model
to be assigned back to after it is compiled.
With the functional API, it is easy to re-use trained models: you can treat any model as if it were a layer. Note that you aren’t just re-using the architecture of the model, you are also re-using its weights.
+x <- layer_input(shape = c(784))
+# This works, and returns the 10-way softmax we defined above.
+y <- x %>% model
This can allow, for instance, to quickly create models that can process sequences of inputs. You could turn an image classification model into a video classification model, in just one line:
+# Input tensor for sequences of 20 timesteps,
+# each containing a 784-dimensional vector
+input_sequences <- layer_input(shape = c(20, 784))
+
+# This applies our previous model to the input sequence
+processed_sequences <- input_sequences %>%
+ time_distributed(model)
Here’s a good use case for the functional API: models with multiple inputs and outputs. The functional API makes it easy to manipulate a large number of intertwined datastreams.
+Let’s consider the following model. We seek to predict how many retweets and likes a news headline will receive on Twitter. The main input to the model will be the headline itself, as a sequence of words, but to spice things up, our model will also have an auxiliary input, receiving extra data such as the time of day when the headline was posted, etc.
+The model will also be supervised via two loss functions. Using the main loss function earlier in a model is a good regularization mechanism for deep models.
+Here’s what our model looks like:
+ +Let’s implement it with the functional API.
+The main input will receive the headline, as a sequence of integers (each integer encodes a word). The integers will be between 1 and 10,000 (a vocabulary of 10,000 words) and the sequences will be 100 words long.
+We’ll include an
+library(keras)
+
+main_input <- layer_input(shape = c(100), dtype = 'int32', name = 'main_input')
+
+lstm_out <- main_input %>%
+ layer_embedding(input_dim = 10000, output_dim = 512, input_length = 100) %>%
+ layer_lstm(units = 32)
Here we insert the auxiliary loss, allowing the LSTM and Embedding layer to be trained smoothly even though the main loss will be much higher in the model:
+auxiliary_output <- lstm_out %>%
+ layer_dense(units = 1, activation = 'sigmoid', name = 'aux_output')
At this point, we feed into the model our auxiliary input data by concatenating it with the LSTM output, stacking a deep densely-connected network on top and adding the main logistic regression layer
+auxiliary_input <- layer_input(shape = c(5), name = 'aux_input')
+
+main_output <- layer_concatenate(c(lstm_out, auxiliary_input)) %>%
+ layer_dense(units = 64, activation = 'relu') %>%
+ layer_dense(units = 64, activation = 'relu') %>%
+ layer_dense(units = 64, activation = 'relu') %>%
+ layer_dense(units = 1, activation = 'sigmoid', name = 'main_output')
This defines a model with two inputs and two outputs:
+model <- keras_model(
+ inputs = c(main_input, auxiliary_input),
+ outputs = c(main_output, auxiliary_output)
+)
summary(model)
Model
+__________________________________________________________________________________________
+Layer (type) Output Shape Param # Connected to
+==========================================================================================
+main_input (InputLayer) (None, 100) 0
+__________________________________________________________________________________________
+embedding_1 (Embedding) (None, 100, 512) 5120000
+__________________________________________________________________________________________
+lstm_1 (LSTM) (None, 32) 69760
+__________________________________________________________________________________________
+aux_input (InputLayer) (None, 5) 0
+__________________________________________________________________________________________
+concatenate_1 (Concatenate) (None, 37) 0
+__________________________________________________________________________________________
+dense_1 (Dense) (None, 64) 2432
+__________________________________________________________________________________________
+dense_2 (Dense) (None, 64) 4160
+__________________________________________________________________________________________
+dense_3 (Dense) (None, 64) 4160
+__________________________________________________________________________________________
+main_output (Dense) (None, 1) 65
+__________________________________________________________________________________________
+aux_output (Dense) (None, 1) 33
+==========================================================================================
+Total params: 5,200,610
+Trainable params: 5,200,610
+Non-trainable params: 0
+__________________________________________________________________________________________
+We compile the model and assign a weight of 0.2 to the auxiliary loss. To specify different loss_weights
or loss
for each different output, you can use a list or a dictionary. Here we pass a single loss as the loss
argument, so the same loss will be used on all outputs.
model %>% compile(
+ optimizer = 'rmsprop',
+ loss = 'binary_crossentropy',
+ loss_weights = c(1.0, 0.2)
+)
We can train the model by passing it lists of input arrays and target arrays:
+model %>% fit(
+ x = list(headline_data, additional_data),
+ y = list(labels, labels),
+ epochs = 50,
+ batch_size = 32
+)
Since our inputs and outputs are named (we passed them a “name” argument), We could also have compiled the model via:
+model %>% compile(
+ optimizer = 'rmsprop',
+ loss = list(main_output = 'binary_crossentropy', aux_output = 'binary_crossentropy'),
+ loss_weights = list(main_output = 1.0, aux_output = 0.2)
+)
+
+# And trained it via:
+model %>% fit(
+ x = list(main_input = headline_data, aux_input = additional_data),
+ y = list(main_output = labels, aux_output = labels),
+ epochs = 50,
+ batch_size = 32
+)
Whenever you are calling a layer on some input, you are creating a new tensor (the output of the layer), and you are adding a “node” to the layer, linking the input tensor to the output tensor. When you are calling the same layer multiple times, that layer owns multiple nodes indexed as 1, 2, 2…
+You can obtain the output tensor of a layer via layer$output
, or its output shape via layer$output_shape
. But what if a layer is connected to multiple inputs?
As long as a layer is only connected to one input, there is no confusion, and $output
will return the one output of the layer:
a <- layer_input(shape = c(140, 256))
+
+lstm <- layer_lstm(units = 32)
+
+encoded_a <- a %>% lstm
+
+lstm$output
Not so if the layer has multiple inputs:
+a <- layer_input(shape = c(140, 256))
+b <- layer_input(shape = c(140, 256))
+
+lstm <- layer_lstm(units = 32)
+
+encoded_a <- a %>% lstm
+encoded_b <- b %>% lstm
+
+lstm$output
AttributeError: Layer lstm_4 has multiple inbound nodes, hence the notion of "layer output" is ill-defined. Use `get_output_at(node_index)` instead.
+Okay then. The following works:
+get_output_at(lstm, 1)
+get_output_at(lstm, 2)
Simple enough, right?
+The same is true for the properties input_shape
and output_shape
: as long as the layer has only one node, or as long as all nodes have the same input/output shape, then the notion of “layer output/input shape” is well defined, and that one shape will be returned by layer$output_shape
/layer$input_shape
. But if, for instance, you apply the same layer_conv_2d()
layer to an input of shape (32, 32, 3)
, and then to an input of shape (64, 64, 3)
, the layer will have multiple input/output shapes, and you will have to fetch them by specifying the index of the node they belong to:
a <- layer_input(shape = c(32, 32, 3))
+b <- layer_input(shape = c(64, 64, 3))
+
+conv <- layer_conv_2d(filters = 16, kernel_size = c(3,3), padding = 'same')
+
+conved_a <- a %>% conv
+
+# only one input so far, the following will work
+conv$input_shape
+
+conved_b <- b %>% conv
+# now the `$input_shape` property wouldn't work, but this does:
+get_input_shape_at(conv, 1)
+get_input_shape_at(conv, 2)
Code examples are still the best way to get started, so here are a few more.
+For more information about the Inception architecture, see Going Deeper with Convolutions.
+library(keras)
+
+input_img <- layer_input(shape = c(256, 256, 3))
+
+tower_1 <- input_img %>%
+ layer_conv_2d(filters = 64, kernel_size = c(1, 1), padding='same', activation='relu') %>%
+ layer_conv_2d(filters = 64, kernel_size = c(3, 3), padding='same', activation='relu')
+
+tower_2 <- input_img %>%
+ layer_conv_2d(filters = 64, kernel_size = c(1, 1), padding='same', activation='relu') %>%
+ layer_conv_2d(filters = 64, kernel_size = c(5, 5), padding='same', activation='relu')
+
+tower_3 <- input_img %>%
+ layer_max_pooling_2d(pool_size = c(3, 3), strides = c(1, 1), padding = 'same') %>%
+ layer_conv_2d(filters = 64, kernel_size = c(1, 1), padding='same', activation='relu')
+
+output <- layer_concatenate(c(tower_1, tower_2, tower_3), axis = 1)
For more information about residual networks, see Deep Residual Learning for Image Recognition.
+# input tensor for a 3-channel 256x256 image
+x <- layer_input(shape = c(256, 256, 3))
+# 3x3 conv with 3 output channels (same as input channels)
+y <- x %>% layer_conv_2d(filters = 3, kernel_size =c(3, 3), padding = 'same')
+# this returns x + y.
+z <- layer_add(c(x, y))
This model can select the correct one-word answer when asked a natural-language question about a picture.
+It works by encoding the question into a vector, encoding the image into a vector, concatenating the two, and training on top a logistic regression over some vocabulary of potential answers.
+# First, let's define a vision model using a Sequential model.
+# This model will encode an image into a vector.
+vision_model <- keras_model_sequential()
+vision_model %>%
+ layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = 'relu', padding = 'same',
+ input_shape = c(224, 224, 3)) %>%
+ layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = 'relu') %>%
+ layer_max_pooling_2d(pool_size = c(2, 2)) %>%
+ layer_conv_2d(filters = 128, kernel_size = c(3, 3), activation = 'relu', padding = 'same') %>%
+ layer_conv_2d(filters = 128, kernel_size = c(3, 3), activation = 'relu') %>%
+ layer_max_pooling_2d(pool_size = c(2, 2)) %>%
+ layer_conv_2d(filters = 256, kernel_size = c(3, 3), activation = 'relu', padding = 'same') %>%
+ layer_conv_2d(filters = 256, kernel_size = c(3, 3), activation = 'relu') %>%
+ layer_conv_2d(filters = 256, kernel_size = c(3, 3), activation = 'relu') %>%
+ layer_max_pooling_2d(pool_size = c(2, 2)) %>%
+ layer_flatten()
+
+# Now let's get a tensor with the output of our vision model:
+image_input <- layer_input(shape = c(224, 224, 3))
+encoded_image <- image_input %>% vision_model
+
+# Next, let's define a language model to encode the question into a vector.
+# Each question will be at most 100 word long,
+# and we will index words as integers from 1 to 9999.
+question_input <- layer_input(shape = c(100), dtype = 'int32')
+encoded_question <- question_input %>%
+ layer_embedding(input_dim = 10000, output_dim = 256, input_length = 100) %>%
+ layer_lstm(units = 256)
+
+# Let's concatenate the question vector and the image vector then
+# train a logistic regression over 1000 words on top
+output <- layer_concatenate(c(encoded_question, encoded_image)) %>%
+ layer_dense(units = 1000, activation='softmax')
+
+# This is our final model:
+vqa_model <- keras_model(inputs = c(image_input, question_input), outputs = output)
Now that we have trained our image QA model, we can quickly turn it into a video QA model. With appropriate training, you will be able to show it a short video (e.g. 100-frame human action) and ask a natural language question about the video (e.g. “what sport is the boy playing?” -> “football”).
+video_input <- layer_input(shape = c(100, 224, 224, 3))
+
+# This is our video encoded via the previously trained vision_model (weights are reused)
+encoded_video <- video_input %>%
+ time_distributed(vision_model) %>%
+ layer_lstm(units = 256)
+
+# This is a model-level representation of the question encoder, reusing the same weights as before:
+question_encoder <- keras_model(inputs = question_input, outputs = encoded_question)
+
+# Let's use it to encode the question:
+video_question_input <- layer_input(shape = c(100), dtype = 'int32')
+encoded_video_question <- video_question_input %>% question_encoder
+
+# And this is our video question answering model:
+output <- layer_concatenate(c(encoded_video, encoded_video_question)) %>%
+ layer_dense(units = 1000, activation = 'softmax')
+
+video_qa_model <- keras_model(inputs= c(video_input, video_question_input), outputs = output)
Keras is a high-level neural networks API developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research. Keras has the following key features:
+Allows the same code to run on CPU or on GPU, seamlessly.
User-friendly API which makes it easy to quickly prototype deep learning models.
Built-in support for convolutional networks (for computer vision), recurrent networks (for sequence processing), and any combination of both.
Supports arbitrary network architectures: multi-input or multi-output models, layer sharing, model sharing, etc. This means that Keras is appropriate for building essentially any deep learning model, from a memory network to a neural Turing machine.
Is capable of running on top of multiple back-ends including TensorFlow, CNTK, or Theano.
This website provides documentation for the R interface to Keras. See the main Keras website at https://keras.io for additional information on the project.
+First, install the keras R package from CRAN as follows:
+install.packages("keras")
The Keras R interface uses the TensorFlow backend engine by default. To install both the core Keras library as well as the TensorFlow backend use the install_keras()
function:
library(keras)
+install_keras()
This will provide you with default installations of Keras and TensorFlow. If you want to do a more customized installation of TensorFlow (including installing a version that takes advantage of Nvidia GPUs if you have the correct CUDA libraries installed) see the documentation for install_keras()
.
We can learn the basics of Keras by walking through a simple example: recognizing handwritten digets from the MNIST dataset. MNIST consists of 28 x 28 grayscale images of handwritten digits like these:
+ +The dataset also includes labels for each image, telling us which digit it is. For example, the labels for the above images are 5, 0, 4, and 1.
+The MNIST dataset is included with Keras and can be accessed using the dataset_mnist()
function. Here we load the dataset then create variables for our test and training data:
library(keras)
+mnist <- dataset_mnist()
+x_train <- mnist$train$x
+y_train <- mnist$train$y
+x_test <- mnist$test$x
+y_test <- mnist$test$y
The x
data is a 3-d array (images,width,height)
of grayscale values . To prepare the data for training we convert the 3-d arrays into matrices by reshaping width and height into a single dimension (28x28 images are flattened into length 784 vectors). Then, we convert the grayscale values from integers ranging between 0 to 255 into floating point values ranging between 0 and 1:
# reshape
+dim(x_train) <- c(nrow(x_train), 784)
+dim(x_test) <- c(nrow(x_test), 784)
+# rescale
+x_train <- x_train / 255
+x_test <- x_test / 255
The y
data is an integer vector with values ranging from 0 to 9. To prepare this data for training we one-hot encode the vectors into binary class matrices using the Keras to_categorical()
function:
y_train <- to_categorical(y_train, 10)
+y_test <- to_categorical(y_test, 10)
The core data structure of Keras is a model, a way to organize layers. The simplest type of model is the Sequential model, a linear stack of layers.
+We begin by creating a sequential model and then adding layers using the pipe (%>%
) operator:
library(keras)
+model <- keras_model_sequential()
+model %>%
+ layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>%
+ layer_dropout(rate = 0.4) %>%
+ layer_dense(units = 128, activation = 'relu') %>%
+ layer_dropout(rate = 0.3) %>%
+ layer_dense(units = 10, activation = 'softmax')
The input_shape
argument to the first layer specifies the shape of the input data (a length 784 numeric vector representing a grayscale image). The final layer outputs a length 10 numeric vector (probabilities for each digit) using a softmax activation function.
Use the summary()
function to print the details of the model:
summary(model)
Model
+________________________________________________________________________________
+Layer (type) Output Shape Param #
+================================================================================
+dense_1 (Dense) (None, 256) 200960
+________________________________________________________________________________
+dropout_1 (Dropout) (None, 256) 0
+________________________________________________________________________________
+dense_2 (Dense) (None, 128) 32896
+________________________________________________________________________________
+dropout_2 (Dropout) (None, 128) 0
+________________________________________________________________________________
+dense_3 (Dense) (None, 10) 1290
+================================================================================
+Total params: 235,146
+Trainable params: 235,146
+Non-trainable params: 0
+________________________________________________________________________________
+Next, compile the model with appropriate loss function, optimizer, and metrics:
+model %>% compile(
+ loss = 'categorical_crossentropy',
+ optimizer = optimizer_rmsprop(),
+ metrics = c('accuracy')
+)
Use the fit()
function to train the model for 30 epochs using batches of 128 images:
history <- model %>% fit(
+ x_train, y_train,
+ epochs = 30, batch_size = 128,
+ validation_split = 0.2
+)
The history
object returned by fit()
includes loss and accuracy metrics which we can plot:
plot(history)
Evaluate the model’s performance on the test data:
+loss_and_metrics <- model %>% evaluate(x_test, y_test)
Generate predictions on new data:
+classes <- model %>% predict_classes(x_test)
Keras provides a vocabulary for building deep learning models that is simple, elegant, and intuitive. Building a question answering system, an image classification model, a neural Turing machine, or any other model is just as straightforward.
+To learn more about Keras, see these other package vignettes:
+The examples demonstrate more advanced models including transfer learning, variational auto-encoding, question-answering with memory networks, text generation with stacked LSTMs, etc.
+The function reference includes detailed information on all of the functions available in the package.
+The sequential model is a linear stack of layers.
+You create a sequential model by calling the keras_model_sequential()
function then a series of layer
functions:
library(keras)
+
+model <- keras_model_sequential()
+model %>%
+ layer_dense(units = 32, input_shape = c(784)) %>%
+ layer_activation('relu') %>%
+ layer_dense(units = 10) %>%
+ layer_activation('softmax')
Note that Keras objects are modified in place which is why it’s not necessary for model
to be assigned back to after the layers are added.
Print a summary of the model’s structure using the summary()
function:
summary(model)
Model
+________________________________________________________________________________
+Layer (type) Output Shape Param #
+================================================================================
+dense_1 (Dense) (None, 256) 200960
+________________________________________________________________________________
+dropout_1 (Dropout) (None, 256) 0
+________________________________________________________________________________
+dense_2 (Dense) (None, 128) 32896
+________________________________________________________________________________
+dropout_2 (Dropout) (None, 128) 0
+________________________________________________________________________________
+dense_3 (Dense) (None, 10) 1290
+================================================================================
+Total params: 235,146
+Trainable params: 235,146
+Non-trainable params: 0
+________________________________________________________________________________
+The model needs to know what input shape it should expect. For this reason, the first layer in a sequential model (and only the first, because following layers can do automatic shape inference) needs to receive information about its input shape.
+As illustrated in the example above, this is done by passing an input_shape
argument to the first layer. This is a list of integers or NULL
entries, where NULL
indicates that any positive integer may be expected. In input_shape
, the batch dimension is not included.
If you ever need to specify a fixed batch size for your inputs (this is useful for stateful recurrent networks), you can pass a batch_size
argument to a layer. If you pass both batch_size=32
and input_shape=c(6, 8)
to a layer, it will then expect every batch of inputs to have the batch shape (32, 6, 8)
.
Before training a model, you need to configure the learning process, which is done via the compile()
function. It receives three arguments:
An optimizer. This could be the string identifier of an existing optimizer (e.g. as “rmsprop” or “adagrad”) or a call to an optimizer function (e.g. optimizer_sgd()
).
A loss function. This is the objective that the model will try to minimize. It can be the string identifier of an existing loss function (e.g. “categorical_crossentropy” or “mse”) or a call to a loss function (e.g. loss_mean_squared_error()
).
A list of metrics. For any classification problem you will want to set this to metrics = c('accuracy')
. A metric could be the string identifier of an existing metric or a call to metric function (e.g. metric_binary_crossentropy()
).
Here’s the definition of a model along with the compilation step (the compile()
function has arguments appropriate for a a multi-class classification problem):
# For a multi-class classification problem
+model <- keras_model_sequential()
+model %>%
+ layer_dense(units = 32, input_shape = c(784)) %>%
+ layer_activation('relu') %>%
+ layer_dense(units = 10) %>%
+ layer_activation('softmax')
+
+model %>% compile(
+ optimizer = 'rmsprop',
+ loss = 'categorical_crossentropy',
+ metrics = c('accuracy')
+)
Here’s what compilation might look like for a mean squared error regression problem:
+model %>% compile(
+ optimizer = optimizer_rmsprop(lr = 0.002),
+ loss = 'mse'
+)
Here’s compilation for a binary classification problem:
+model %>% compile(
+ optimizer = optimizer_rmsprop(),
+ loss = loss_binary_crossentropy,
+ metrics = metric_binary_accuracy
+)
Here’s compilation with a custom metric:
+# create metric using backend tensor functions
+K <- backend()
+metric_mean_pred <- function(y_true, y_pred) {
+ K$mean(y_pred)
+}
+
+model %>% compile(
+ optimizer = optimizer_rmsprop(),
+ loss = loss_binary_crossentropy,
+ metrics = c('accuracy',
+ 'mean_pred' = metric_mean_pred)
+)
Keras models are trained on R matrixes or higher dimensional arrays of input data and labels. For training a model, you will typically use the fit()
function.
Here’s a single-input model with 2 classes (binary classification):
+# create model
+model <- keras_model_sequential()
+
+# add layers and compile the model
+model %>%
+ layer_dense(units = 32, activation = 'relu', input_shape = c(100)) %>%
+ layer_dense(units = 1, activation = 'sigmoid') %>%
+ compile(
+ optimizer = 'rmsprop',
+ loss = 'binary_crossentropy',
+ metrics = c('accuracy')
+ )
+
+# Generate dummy data
+data <- matrix(runif(1000*100), nrow = 1000, ncol = 100)
+labels <- matrix(round(runif(1000, min = 0, max = 1)), nrow = 1000, ncol = 1)
+
+# Train the model, iterating on the data in batches of 32 samples
+model %>% fit(data, labels, epochs=10, batch_size=32)
Here’s a single-input model with 10 classes (categorical classification):
+# create model
+model <- keras_model_sequential()
+
+# define and compile the model
+model %>%
+ layer_dense(units = 32, activation = 'relu', input_shape = c(100)) %>%
+ layer_dense(units = 10, activation = 'softmax') %>%
+ compile(
+ optimizer = 'rmsprop',
+ loss = 'categorical_crossentropy',
+ metrics = c('accuracy')
+ )
+
+# Generate dummy data
+data <- matrix(runif(1000*100), nrow = 1000, ncol = 100)
+labels <- matrix(round(runif(1000, min = 0, max = 9)), nrow = 1000, ncol = 1)
+
+# Convert labels to categorical one-hot encoding
+one_hot_labels <- to_categorical(labels, num_classes = 10)
+
+# Train the model, iterating on the data in batches of 32 samples
+model %>% fit(data, one_hot_labels, epochs=10, batch_size=32)
Here are a few examples to get you started!
+On the examples page you will also find example models for real datasets:
+Some additional examples are provided below.
+library(keras)
+
+# generate dummy data
+x_train <- matrix(runif(1000*20), nrow = 1000, ncol = 20)
+
+y_train <- runif(1000, min = 0, max = 9) %>%
+ round() %>%
+ matrix(nrow = 1000, ncol = 1) %>%
+ to_categorical(num_classes = 10)
+
+x_test <- matrix(runif(100*20), nrow = 100, ncol = 20)
+
+y_test <- runif(100, min = 0, max = 9) %>%
+ round() %>%
+ matrix(nrow = 100, ncol = 1) %>%
+ to_categorical(num_classes = 10)
+
+# create model
+model <- keras_model_sequential()
+
+# define and compile the model
+model %>%
+ layer_dense(units = 64, activation = 'relu', input_shape = c(20)) %>%
+ layer_dropout(rate = 0.5) %>%
+ layer_dense(units = 64, activation = 'relu') %>%
+ layer_dropout(rate = 0.5) %>%
+ layer_dense(units = 10, activation = 'softmax') %>%
+ compile(
+ loss = 'categorical_crossentropy',
+ optimizer = optimizer_sgd(lr = 0.01, decay = 1e-6, momentum = 0.9, nesterov = TRUE),
+ metrics = c('accuracy')
+ )
+
+# train
+model %>% fit(x_train, y_train, epochs = 20, batch_size = 128)
+
+# evaluate
+score <- model %>% evaluate(x_test, y_test, batch_size = 128)
library(keras)
+
+# generate dummy data
+x_train <- matrix(runif(1000*20), nrow = 1000, ncol = 20)
+y_train <- matrix(round(runif(1000, min = 0, max = 1)), nrow = 1000, ncol = 1)
+x_test <- matrix(runif(100*20), nrow = 100, ncol = 20)
+y_test <- matrix(round(runif(100, min = 0, max = 1)), nrow = 100, ncol = 1)
+
+# create model
+model <- keras_model_sequential()
+
+# define and compile the model
+model %>%
+ layer_dense(units = 64, activation = 'relu', input_shape = c(20)) %>%
+ layer_dropout(rate = 0.5) %>%
+ layer_dense(units = 64, activation = 'relu') %>%
+ layer_dropout(rate = 0.5) %>%
+ layer_dense(units = 1, activation = 'sigmoid') %>%
+ compile(
+ loss = 'binary_crossentropy',
+ optimizer = 'rmsprop',
+ metrics = c('accuracy')
+ )
+
+# train
+model %>% fit(x_train, y_train, epochs = 20, batch_size = 128)
+
+# evaluate
+score = model %>% evaluate(x_test, y_test, batch_size=128)
library(keras)
+
+# generate dummy data
+x_train <- array(runif(100 * 100 * 100 * 3), dim = c(100, 100, 100, 3))
+
+y_train <- runif(100, min = 0, max = 9) %>%
+ round() %>%
+ matrix(nrow = 100, ncol = 1) %>%
+ to_categorical(num_classes = 10)
+
+x_test <- array(runif(20 * 100 * 100 * 3), dim = c(20, 100, 100, 3))
+
+y_test <- runif(20, min = 0, max = 9) %>%
+ round() %>%
+ matrix(nrow = 20, ncol = 1) %>%
+ to_categorical(num_classes = 10)
+
+# create model
+model <- keras_model_sequential()
+
+# define and compile model
+# input: 100x100 images with 3 channels -> (100, 100, 3) tensors.
+# this applies 32 convolution filters of size 3x3 each.
+model %>%
+ layer_conv_2d(filters = 32, kernel_size = c(3,3), activation = 'relu',
+ input_shape = c(100,100,3)) %>%
+ layer_conv_2d(filters = 32, kernel_size = c(3,3), activation = 'relu') %>%
+ layer_max_pooling_2d(pool_size = c(2,2)) %>%
+ layer_dropout(rate = 0.25) %>%
+ layer_conv_2d(filters = 64, kernel_size = c(3,3), activation = 'relu') %>%
+ layer_conv_2d(filters = 64, kernel_size = c(3,3), activation = 'relu') %>%
+ layer_max_pooling_2d(pool_size = c(2,2)) %>%
+ layer_dropout(rate = 0.25) %>%
+ layer_flatten() %>%
+ layer_dense(units = 256, activation = 'relu') %>%
+ layer_dropout(rate = 0.25) %>%
+ layer_dense(units = 10, activation = 'softmax') %>%
+ compile(
+ loss = 'categorical_crossentropy',
+ optimizer = optimizer_sgd(lr = 0.01, decay = 1e-6, momentum = 0.9, nesterov = TRUE)
+ )
+
+# train
+model %>% fit(x_train, y_train, batch_size = 32, epochs = 10)
+
+# evaluate
+score <- model %>% evaluate(x_test, y_test, batch_size = 32)
model <- keras_model_sequential()
+model %>%
+ layer_embedding(input_dim = max_features, output_dim - 256) %>%
+ layer_lstm(units = 128) %>%
+ layer_dropout(rate = 0.5) %>%
+ layer_dense(units = 1, activation = 'sigmoid') %>%
+ compile(
+ loss = 'binary_crossentropy',
+ optimizer = 'rmsprop',
+ metrics = c('accuracy')
+ )
+
+model %>% fit(x_train, y_train, batch_size = 16, epochs = 10)
+score <- model %>% evaluate(x_test, y_test, batch_size = 16)
model <- keras_model_sequential()
+model %>%
+ layer_conv_1d(filters = 64, kernel_size = 3, activation = 'relu',
+ input_shape = c(seq_length, 100)) %>%
+ layer_conv_1d(filters = 64, kernel_size = 3, activation = 'relu') %>%
+ layer_max_pooling_1d(pool_size = 3) %>%
+ layer_conv_1d(filters = 128, kernel_size = 3, activation = 'relu') %>%
+ layer_conv_1d(filters = 128, kernel_size = 3, activation = 'relu') %>%
+ layer_global_average_pooling_1d() %>%
+ layer_dropout(rate = 0.5) %>%
+ layer_dense(units = 1, activation = 'sigmoid') %>%
+ compile(
+ loss = 'binary_crossentropy',
+ optimizer = 'rmsprop',
+ metrics = c('accuracy')
+ )
+
+model %>% fit(x_train, y_train, batch_size = 16, epochs = 10)
+score <- model %>% evaluate(x_test, y_test, batch_size = 16)
In this model, we stack 3 LSTM layers on top of each other, making the model capable of learning higher-level temporal representations.
+The first two LSTMs return their full output sequences, but the last one only returns the last step in its output sequence, thus dropping the temporal dimension (i.e. converting the input sequence into a single vector).
+ +library(keras)
+
+# constants
+data_dim <- 16
+timesteps <- 8
+num_classes <- 10
+
+# define and compile model
+# expected input data shape: (batch_size, timesteps, data_dim)
+model <- keras_model_sequential()
+model %>%
+ layer_lstm(units = 32, return_sequences = TRUE, input_shape = c(timesteps, data_dim)) %>%
+ layer_lstm(units = 32, return_sequences = TRUE) %>%
+ layer_lstm(units = 32) %>% # return a single vector dimension 32
+ layer_dense(units = 10, activation = 'softmax') %>%
+ compile(
+ loss = 'categorical_crossentropy',
+ optimizer = 'rmsprop',
+ metrics = c('accuracy')
+ )
+
+# generate dummy training data
+x_train <- array(runif(1000 * timesteps * data_dim), dim = c(1000, timesteps, data_dim))
+y_train <- matrix(runif(1000 * num_classes), nrow = 1000, ncol = num_classes)
+
+# generate dummy validation data
+x_val <- array(runif(100 * timesteps * data_dim), dim = c(100, timesteps, data_dim))
+y_val <- matrix(runif(100 * num_classes), nrow = 100, ncol = num_classes)
+
+# train
+model %>% fit(
+ x_train, y_train, batch_size = 64, epochs = 5, validation_data = list(x_val, y_val)
+)
A stateful recurrent model is one for which the internal states (memories) obtained after processing a batch of samples are reused as initial states for the samples of the next batch. This allows to process longer sequences while keeping computational complexity manageable.
+You can read more about stateful RNNs in the FAQ.
+library(keras)
+
+# constants
+data_dim <- 16
+timesteps <- 8
+num_classes <- 10
+batch_size <- 32
+
+# define and compile model
+# Expected input batch shape: (batch_size, timesteps, data_dim)
+# Note that we have to provide the full batch_input_shape since the network is stateful.
+# the sample of index i in batch k is the follow-up for the sample i in batch k-1.
+model <- keras_model_sequential()
+model %>%
+ layer_lstm(units = 32, return_sequences = TRUE, stateful = TRUE,
+ batch_input_shape = c(batch_size, timesteps, data_dim)) %>%
+ layer_lstm(units = 32, return_sequences = TRUE, stateful = TRUE) %>%
+ layer_lstm(units = 32, stateful = TRUE) %>%
+ layer_dense(units = 10, activation = 'softmax') %>%
+ compile(
+ loss = 'categorical_crossentropy',
+ optimizer = 'rmsprop',
+ metrics = c('accuracy')
+ )
+
+# generate dummy training data
+x_train <- array(runif( (batch_size * 10) * timesteps * data_dim),
+ dim = c(batch_size * 10, timesteps, data_dim))
+y_train <- matrix(runif( (batch_size * 10) * num_classes),
+ nrow = batch_size * 10, ncol = num_classes)
+
+# generate dummy validation data
+x_val <- array(runif( (batch_size * 3) * timesteps * data_dim),
+ dim = c(batch_size * 3, timesteps, data_dim))
+y_val <- matrix(runif( (batch_size * 3) * num_classes),
+ nrow = batch_size * 3, ncol = num_classes)
+
+# train
+model %>% fit(
+ x_train,
+ y_train,
+ batch_size = batch_size,
+ epochs = 5,
+ shuffle = FALSE,
+ validation_data = list(x_val, y_val)
+)
A callback is a set of functions to be applied at given stages of the training procedure. You can use callbacks to get a view on internal states and statistics of the model during training. You can pass a list of callbacks (as the keyword argument callbacks
) to the fit()
function. The relevant methods of the callbacks will then be called at each stage of the training.
For example:
+library(keras)
+
+# generate dummy training data
+data <- matrix(rexp(1000*784), nrow = 1000, ncol = 784)
+labels <- matrix(round(runif(1000*10, min = 0, max = 9)), nrow = 1000, ncol = 10)
+
+# create model
+model <- keras_model_sequential()
+
+# add layers and compile
+model %>%
+ layer_dense(32, input_shape = c(784)) %>%
+ layer_activation('relu') %>%
+ layer_dense(10) %>%
+ layer_activation('softmax') %>%
+ compile(
+ loss='binary_crossentropy',
+ optimizer = optimizer_sgd(),
+ metrics='accuracy'
+ )
+
+# fit with callbacks
+model %>% fit(data, labels, callbacks = list(
+ callback_model_checkpoint("checkpoints.h5"),
+ callback_reduce_lr_on_plateau(monitor = "val_loss", factor = 0.1)
+))
The following built-in callbacks are available as part of Keras:
+
+callback_progbar_logger()
+ |
+
+ +Callback that prints metrics to stdout. + + |
+
+callback_model_checkpoint()
+ |
+
+ +Save the model after every epoch. + + |
+
+callback_early_stopping()
+ |
+
+ +Stop training when a monitored quantity has stopped improving. + + |
+
+callback_remote_monitor()
+ |
+
+ +Callback used to stream events to a server. + + |
+
+callback_learning_rate_scheduler()
+ |
+
+ +Learning rate scheduler. + + |
+
+callback_tensorboard()
+ |
+
+ +TensorBoard basic visualizations + + |
+
+callback_reduce_lr_on_plateau()
+ |
+
+ +Reduce learning rate when a metric has stopped improving. + + |
+
+callback_csv_logger()
+ |
+
+ +Callback that streams epoch results to a csv file + + |
+
+callback_lambda()
+ |
+
+ +Create a custom callback + + |
+
You can create a custom callback by creating a new R6 class that inherits from the KerasCallback
class.
Here’s a simple example saving a list of losses over each batch during training:
+library(keras)
+
+# define custom callback class
+LossHistory <- R6::R6Class("LossHistory",
+ inherit = KerasCallback,
+
+ public = list(
+
+ losses = NULL,
+
+ on_batch_end = function(batch, logs = list()) {
+ self$losses <- c(self$losses, logs[["loss"]])
+ }
+))
+
+# define model
+model <- keras_model_sequential()
+
+# add layers and compile
+model %>%
+ layer_dense(units = 10, input_shape = c(784)) %>%
+ layer_activation(activation = 'softmax') %>%
+ compile(
+ loss = 'categorical_crossentropy',
+ optimizer = 'rmsprop'
+ )
+
+# create history callback object and use it during training
+history <- LossHistory$new()
+model %>% fit(
+ X_train, Y_train,
+ batch_size=128, epochs=20, verbose=0,
+ callbacks= list(history)
+)
+
+# print the accumulated losses
+history$losses
[1] 0.6604760 0.3547246 0.2595316 0.2590170 ...
+Custom callback objects have access to the current model and it’s training parameters via the following fields:
+self$params
Named list with training parameters (eg. verbosity, batch size, number of epochs…).
+self$model
Reference to the Keras model being trained.
+Custom callback objects can implement one or more of the following methods:
+on_epoch_begin(epoch, logs)
Called at the beginning of each epoch.
+on_epoch_end(epoch, logs)
Called at the end of each epoch.
+on_batch_begin(batch, logs)
Called at the beginning of each batch.
+on_batch_end(batch, logs)
Called at the end of each batch.
+on_train_begin(logs)
Called at the beginning of training.
+on_train_end(logs)
Called at the end of training.
+There are a number of tools available for visualizing the training of Keras models, including:
+A plot method for the Keras training history returned from fit()
.
Real time visualization of training metrics within the RStudio IDE.
Integration with the TensorBoard visualization tool included with TensorFlow. Beyond just training metrics, TensorBoard has a wide variety of other visualizations available including the underlying TensorFlow graph, gradient histograms, model weights, and more. TensorBoard also enables you to compare metrics across multiple training runs.
Each of these tools is described in more detail below.
+The Keras fit()
method returns an R object containing the training history, including the value of metrics at the end of each epoch . You can plot the training metrics by epoch using the plot()
method.
For example, here we compile and fit a model with the “accuracy” metric:
+model %>% compile(
+ loss = 'categorical_crossentropy',
+ optimizer = optimizer_rmsprop(),
+ metrics = c('accuracy')
+)
+
+history <- model %>% fit(
+ x_train, y_train,
+ epochs = 30, batch_size = 128,
+ validation_split = 0.2
+)
We can then plot the training history as follows:
+plot(history)
The history will be plotted using ggplot2 if available (if not then base graphics will be used), include all specified metrics as well as the loss, and draw a smoothing line if there are 10 or more epochs. You can customize all of this behavior via various options of the plot method.
+If you want to create a custom visualization you can call the as.data.frame()
method on the history to obtain a data frame with factors for each metric as well as training vs. validation:
history_df <- as.data.frame(history)
+str(history_df)
'data.frame': 120 obs. of 4 variables:
+ $ epoch : int 1 2 3 4 5 6 7 8 9 10 ...
+ $ value : num 0.87 0.941 0.954 0.962 0.965 ...
+ $ metric: Factor w/ 2 levels "acc","loss": 1 1 1 1 1 1 1 1 1 1 ...
+ $ data : Factor w/ 2 levels "training","validation": 1 1 1 1 1 1 1 1 1 1 ...
+If you are training your model within the RStudio IDE then real time metrics are available within the Viewer pane:
+ +The view_metrics
argument of the fit()
method controls whether real time metrics are displayed. By default metrics are automatically displayed if one or more metrics are specified in the call to compile()
and there is more than one training epoch.
You can explicitly control whether metrics are displayed by specifying the view_metrics
argument. You can also set a global session default using the keras.view_metrics
option:
# don't show metrics during this run
+history <- model %>% fit(
+ x_train, y_train,
+ epochs = 30, batch_size = 128,
+ view_metrics = FALSE,
+ validation_split = 0.2
+)
+
+# set global default to never show metrics
+options(keras.view_metrics = FALSE)
Note that when view_metrics
is TRUE
metrics will be displayed even when not running within RStudio (in that case metrics will be displayed in an external web browser).
TensorBoard is a visualization tool included with TensorFlow that enables you to visualize dynamic graphs of your Keras training and test metrics, as well as activation histograms for the different layers in your model.
+For example, here’s a TensorBoard display for Keras accuracy and loss metrics:
+To record data that can be visualized with TensorBoard, you add a TensorBoard callback to the fit()
function. For example:
history <- model %>% fit(
+ x_train, y_train,
+ batch_size = batch_size,
+ epochs = epochs,
+ verbose = 1,
+ callbacks = callback_tensorboard("logs/run_a"),
+ validation_split = 0.2
+)
See the documentation on the callback_tensorboard()
function for the various available options. The most important option is the log_dir
, which determines which directory logs are written to for a given training run.
You should either use a distinct log directory for each training run or remove the log directory between runs.
+To view TensorBoard data for a given set of runs you use the tensorboard()
function, pointing it to the previously specified log_dir
:
tensorboard("logs/run_a")
It’s often useful to run TensorBoard while you are training a model. To do this, simply launch tensorboard within the training directory right before you begin training:
+# launch TensorBoard (data won't show up until after the first epoch)
+tensorboard("logs/run_a")
+
+# fit the model with the TensorBoard callback
+history <- model %>% fit(
+ x_train, y_train,
+ batch_size = batch_size,
+ epochs = epochs,
+ verbose = 1,
+ callbacks = callback_tensorboard("logs/run_a"),
+ validation_split = 0.2
+)
Keras writes TensorBoard data at the end of each epoch so you won’t see any data in TensorBoard until 10-20 seconds after the end of the first epoch (TensorBoard automatically refreshes it’s display every 30 seconds during training).
+TensorBoard will automatically include all runs logged within the sub-directories of the specified log_dir
, for example, if you logged another run using:
callback_tensorboard(log_dir = "logs/run_b")
Then called tensorboard as follows:
+tensorboard("logs")
The TensorBoard visualization would look like this:
+You can also pass multiple log directories. For example:
+tensorboard(c("logs/run_a", "logs/run_b"))
In the above examples TensorBoard metrics are logged for loss and accuracy. The TensorBoard callback will log data for any metrics which are specified in the metrics
parameter of the compile()
function. For example, in the following code:
model %>% compile(
+ loss = 'mean_squared_error',
+ optimizer = 'sgd',
+ metrics= c('mae', 'acc')
+)
TensorBoard data series will be created for the loss (mean squared error) as well as for the mean absolute error and accuracy metrics.
+The callback_tensorboard()
function includes a number of other options that control logging during training:
callback_tensorboard(log_dir = "logs", histogram_freq = 0,
+ write_graph = TRUE, write_images = FALSE, embeddings_freq = 0,
+ embeddings_layer_names = NULL, embeddings_metadata = NULL)
Name | +Description | +
---|---|
log_dir |
+Path of the directory to save the log files to be parsed by Tensorboard. | +
histogram_freq |
+Frequency (in epochs) at which to compute activation histograms for the layers of the model. If set to 0 (the default), histograms won’t be computed. | +
write_graph |
+Whether to visualize the graph in Tensorboard. The log file can become quite large when write_graph is set to TRUE
+ |
+
write_images |
+Whether to write model weights to visualize as image in Tensorboard. | +
embeddings_freq |
+Frequency (in epochs) at which selected embedding layers will be saved. | +
embeddings_layer_names |
+A list of names of layers to keep eye on. If NULL or empty list all the embedding layers will be watched. |
+
embeddings_metadata |
+A named list which maps layer name to a file name in which metadata for this embedding layer is saved. See the details about the metadata file format. In case if the same metadata file is used for all embedding layers, string can be passed. | +
Base R6 class for Keras callbacks
+ + +KerasCallback
+
+ An R6Class generator object
+ +KerasCallback.
+ +The logs
named list that callback methods take as argument will
+contain keys for quantities relevant to the current batch or epoch.
Currently, the fit()
method for sequential models will include the following quantities in the logs
that
+it passes to its callbacks:
on_epoch_end
: logs include acc
and loss
, and optionally include val_loss
(if validation is enabled in fit
), and val_acc
(if validation and accuracy monitoring are enabled).
on_batch_begin
: logs include size
, the number of samples in the current batch.
on_batch_end
: logs include loss
, and optionally acc
(if accuracy monitoring is enabled).
params
Named list with training parameters (eg. verbosity, batch size, number of epochs...).
model
Reference to the Keras model being trained.
on_epoch_begin(epoch, logs)
Called at the beginning of each epoch.
on_epoch_end(epoch, logs)
Called at the end of each epoch.
on_batch_begin(batch, logs)
Called at the beginning of each batch.
on_batch_end(batch, logs)
Called at the end of each batch.
on_train_begin(logs)
Called at the beginning of training.
on_train_end(logs)
Called at the end of training.
# NOT RUN { +library(keras) + +LossHistory <- R6::R6Class("LossHistory", + inherit = KerasCallback, + + public = list( + + losses = NULL, + + on_batch_end = function(batch, logs = list()) { + self$losses <- c(self$losses, logs[["loss"]]) + } + ) +) +# }+
Base R6 class for Keras layers
+ + +KerasLayer
+
+ An R6Class generator object #'
+ +KerasLayer.
+ +build(input_shape)
Creates the +layer weights (must be implemented by all layers that have weights)
call(inputs,mask)
Call the layer on an input tensor.
compute_output_shape(input_shape)
Compute the output shape +for the layer.
add_weight(name,shape,dtype,initializer,regularizer,trainable,constraint)
Adds +a weight variable to the layer.
Activations functions can either be used through layer_activation()
, or
+through the activation argument supported by all forward layers.
activation_relu(x, alpha = 0, max_value = NULL) + +activation_elu(x, alpha = 1) + +activation_selu(x) + +activation_hard_sigmoid(x) + +activation_linear(x) + +activation_sigmoid(x) + +activation_softmax(x, axis = -1) + +activation_softplus(x) + +activation_softsign(x) + +activation_tanh(x)+ +
x | +Tensor |
+
---|---|
alpha | +Alpha value |
+
max_value | +Max value |
+
axis | +Integer, axis along which the softmax normalization is applied |
+
activation_selu()
: Self-Normalizing Neural Networks
Inception V3 model, with weights pre-trained on ImageNet.
+ + +application_inception_v3(include_top = TRUE, weights = "imagenet", + input_tensor = NULL, input_shape = NULL, pooling = NULL, + classes = 1000) + +inception_v3_preprocess_input(x)+ +
include_top | +whether to include the fully-connected layer at the top of +the network. |
+
---|---|
weights | +one of |
+
input_tensor | +optional Keras tensor to use as image input for the +model. |
+
input_shape | +optional shape list, only to be specified if |
+
pooling | +Optional pooling mode for feature extraction when
+
|
+
classes | +optional number of classes to classify images into, only to be
+specified if |
+
x | +Input tensor for preprocessing |
+
A Keras model instance.
+ +Do note that the input image format for this model is different than for +the VGG16 and ResNet models (299x299 instead of 224x224).
+The inception_v3_preprocess_input()
function should be used for image
+preprocessing.
MobileNet model architecture.
+ + +application_mobilenet(input_shape = NULL, alpha = 1, depth_multiplier = 1, + dropout = 0.001, include_top = TRUE, weights = "imagenet", + input_tensor = NULL, pooling = NULL, classes = 1000) + +mobilenet_preprocess_input(x) + +mobilenet_decode_predictions(preds, top = 5) + +mobilenet_load_model_hdf5(filepath)+ +
input_shape | +optional shape list, only to be specified if |
+
---|---|
alpha | +controls the width of the network.
|
+
depth_multiplier | +depth multiplier for depthwise convolution (also +called the resolution multiplier) |
+
dropout | +dropout rate |
+
include_top | +whether to include the fully-connected layer at the top of +the network. |
+
weights | +
|
+
input_tensor | +optional Keras tensor (i.e. output of |
+
pooling | +Optional pooling mode for feature extraction when
+ |
+
classes | +optional number of classes to classify images into, only to be
+specified if |
+
x | +input tensor, 4D |
+
preds | +Tensor encoding a batch of predictions. |
+
top | +integer, how many top-guesses to return. |
+
filepath | +File path |
+
application_mobilenet()
and mobilenet_load_model_hdf5()
return a
+Keras model instance. mobilenet_preprocess_input()
returns image input
+suitable for feeding into a mobilenet model. mobilenet_decode_predictions()
+returns a list of data frames with variables class_name
, class_description
,
+and score
(one data frame per sample in batch input).
The mobilenet_preprocess_input()
function should be used for image
+preprocessing. To load a saved instance of a MobileNet model use
+the mobilenet_load_model_hdf5()
function. To prepare image input
+for MobileNet use mobilenet_preprocess_input()
. To decode
+predictions use mobilenet_decode_predictions()
.
MobileNet is currently only supported with the TensorFlow backend.
+ +ResNet50 model for Keras.
+ + +application_resnet50(include_top = TRUE, weights = "imagenet", + input_tensor = NULL, input_shape = NULL, pooling = NULL, + classes = 1000)+ +
include_top | +whether to include the fully-connected layer at the top of +the network. |
+
---|---|
weights | +one of |
+
input_tensor | +optional Keras tensor to use as image input for the +model. |
+
input_shape | +optional shape list, only to be specified if |
+
pooling | +Optional pooling mode for feature extraction when
+
|
+
classes | +optional number of classes to classify images into, only to be
+specified if |
+
A Keras model instance.
+ +Optionally loads weights pre-trained on ImageNet.
+The imagenet_preprocess_input()
function should be used for image
+preprocessing.
- Deep Residual Learning for ImageRecognition
+ + +# NOT RUN { +library(keras) + +# instantiate the model +model <- application_resnet50(weights = 'imagenet') + +# load the image +img_path <- "elephant.jpg" +img <- image_load(img_path, target_size = c(224,224)) +x <- image_to_array(img) + +# ensure we have a 4d tensor with single element in the batch dimension, +# the preprocess the input for prediction using resnet50 +dim(x) <- c(1, dim(x)) +x <- imagenet_preprocess_input(x) + +# make predictions then decode and print them +preds <- model %>% predict(x) +imagenet_decode_predictions(preds, top = 3)[[1]] +# }+
VGG16 and VGG19 models for Keras.
+ + +application_vgg16(include_top = TRUE, weights = "imagenet", + input_tensor = NULL, input_shape = NULL, pooling = NULL, + classes = 1000) + +application_vgg19(include_top = TRUE, weights = "imagenet", + input_tensor = NULL, input_shape = NULL, pooling = NULL, + classes = 1000)+ +
include_top | +whether to include the 3 fully-connected layers at the top +of the network. |
+
---|---|
weights | +one of |
+
input_tensor | +optional Keras tensor to use as image input for the +model. |
+
input_shape | +optional shape list, only to be specified if |
+
pooling | +Optional pooling mode for feature extraction when
+
|
+
classes | +optional number of classes to classify images into, only to be
+specified if |
+
Keras model instance.
+ +Optionally loads weights pre-trained on ImageNet.
+The imagenet_preprocess_input()
function should be used for image preprocessing.
- Very Deep Convolutional Networks for Large-Scale ImageRecognition
+ + +# NOT RUN { +library(keras) + +model <- application_vgg16(weights = 'imagenet', include_top = FALSE) + +img_path <- "elephant.jpg" +img <- image_load(img_path, target_size = c(224,224)) +x <- image_to_array(img) +dim(x) <- c(1, dim(x)) +x <- imagenet_preprocess_input(x) + +features <- model %>% predict(x) +# }+
Xception V1 model for Keras.
+ + +application_xception(include_top = TRUE, weights = "imagenet", + input_tensor = NULL, input_shape = NULL, pooling = NULL, + classes = 1000) + +xception_preprocess_input(x)+ +
include_top | +whether to include the fully-connected layer at the top of +the network. |
+
---|---|
weights | +one of |
+
input_tensor | +optional Keras tensor to use as image input for the +model. |
+
input_shape | +optional shape list, only to be specified if |
+
pooling | +Optional pooling mode for feature extraction when
+
|
+
classes | +optional number of classes to classify images into, only to be
+specified if |
+
x | +Input tensor for preprocessing |
+
A Keras model instance.
+ +On ImageNet, this model gets to a top-1 validation accuracy of 0.790 +and a top-5 validation accuracy of 0.945.
+Do note that the input image format for this model is different than for +the VGG16 and ResNet models (299x299 instead of 224x224).
+The xception_preprocess_input()
function should be used for image
+preprocessing.
This application is only available when using the TensorFlow back-end.
+ +Obtain a reference to the keras.backend
Python module used to implement
+tensor operations.
backend(convert = TRUE)+ +
convert | +
|
+
---|
Reference to Keras backend python module.
+ +See the documentation here https://keras.io/backend/ for +additional details on the available functions.
+ + +Bidirectional wrapper for RNNs.
+ + +bidirectional(object, layer, merge_mode = "concat", input_shape = NULL, + batch_input_shape = NULL, batch_size = NULL, dtype = NULL, + name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
layer | +Recurrent instance. |
+
merge_mode | +Mode by which outputs of the forward and backward RNNs will +be combined. One of 'sum', 'mul', 'concat', 'ave', NULL. If NULL, the +outputs will not be combined, they will be returned as a list. |
+
input_shape | +Dimensionality of the input (integer) not including the +samples axis. This argument is required when using this layer as the first +layer in a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
Other layer wrappers: time_distributed
Supports all values that can be represented as a string
+ + +callback_csv_logger(filename, separator = ",", append = FALSE)+ +
filename | +filename of the csv file, e.g. 'run/log.csv'. |
+
---|---|
separator | +string used to separate elements in the csv file. |
+
append | +
|
+
Other callbacks: callback_early_stopping
,
+ callback_lambda
,
+ callback_learning_rate_scheduler
,
+ callback_model_checkpoint
,
+ callback_progbar_logger
,
+ callback_reduce_lr_on_plateau
,
+ callback_remote_monitor
,
+ callback_tensorboard
,
+ callback_terminate_on_naan
Stop training when a monitored quantity has stopped improving.
+ + +callback_early_stopping(monitor = "val_loss", min_delta = 0, patience = 0, + verbose = 0, mode = c("auto", "min", "max"))+ +
monitor | +quantity to be monitored. |
+
---|---|
min_delta | +minimum change in the monitored quantity to qualify as an +improvement, i.e. an absolute change of less than min_delta, will count as +no improvement. |
+
patience | +number of epochs with no improvement after which training +will be stopped. |
+
verbose | +verbosity mode, 0 or 1. |
+
mode | +one of "auto", "min", "max". In |
+
Other callbacks: callback_csv_logger
,
+ callback_lambda
,
+ callback_learning_rate_scheduler
,
+ callback_model_checkpoint
,
+ callback_progbar_logger
,
+ callback_reduce_lr_on_plateau
,
+ callback_remote_monitor
,
+ callback_tensorboard
,
+ callback_terminate_on_naan
This callback is constructed with anonymous functions that will be called at +the appropriate time. Note that the callbacks expects positional arguments, +as:
on_epoch_begin
and on_epoch_end
expect two positional arguments: epoch
, logs
on_batch_begin
and on_batch_end
expect two positional arguments: batch
, logs
on_train_begin
and on_train_end
expect one positional argument: logs
callback_lambda(on_epoch_begin = NULL, on_epoch_end = NULL, + on_batch_begin = NULL, on_batch_end = NULL, on_train_begin = NULL, + on_train_end = NULL)+ +
on_epoch_begin | +called at the beginning of every epoch. |
+
---|---|
on_epoch_end | +called at the end of every epoch. |
+
on_batch_begin | +called at the beginning of every batch. |
+
on_batch_end | +called at the end of every batch. |
+
on_train_begin | +called at the beginning of model training. |
+
on_train_end | +called at the end of model training. |
+
Other callbacks: callback_csv_logger
,
+ callback_early_stopping
,
+ callback_learning_rate_scheduler
,
+ callback_model_checkpoint
,
+ callback_progbar_logger
,
+ callback_reduce_lr_on_plateau
,
+ callback_remote_monitor
,
+ callback_tensorboard
,
+ callback_terminate_on_naan
Learning rate scheduler.
+ + +callback_learning_rate_scheduler(schedule)+ +
schedule | +a function that takes an epoch index as input (integer, +indexed from 0) and returns a new learning rate as output (float). |
+
---|
Other callbacks: callback_csv_logger
,
+ callback_early_stopping
,
+ callback_lambda
,
+ callback_model_checkpoint
,
+ callback_progbar_logger
,
+ callback_reduce_lr_on_plateau
,
+ callback_remote_monitor
,
+ callback_tensorboard
,
+ callback_terminate_on_naan
filepath
can contain named formatting options, which will be filled the
+value of epoch
and keys in logs
(passed in on_epoch_end
). For example:
+if filepath
is weights.{epoch:02d}-{val_loss:.2f}.hdf5
, then the model
+checkpoints will be saved with the epoch number and the validation loss in
+the filename.
callback_model_checkpoint(filepath, monitor = "val_loss", verbose = 0, + save_best_only = FALSE, save_weights_only = FALSE, mode = c("auto", + "min", "max"), period = 1)+ +
filepath | +string, path to save the model file. |
+
---|---|
monitor | +quantity to monitor. |
+
verbose | +verbosity mode, 0 or 1. |
+
save_best_only | +if |
+
save_weights_only | +if |
+
mode | +one of "auto", "min", "max". If |
+
period | +Interval (number of epochs) between checkpoints. |
+
if filepath
is
+weights.{epoch:02d}-{val_loss:.2f}.hdf5
,: then the model checkpoints will
+be saved with the epoch number and the validation loss in the filename.
Other callbacks: callback_csv_logger
,
+ callback_early_stopping
,
+ callback_lambda
,
+ callback_learning_rate_scheduler
,
+ callback_progbar_logger
,
+ callback_reduce_lr_on_plateau
,
+ callback_remote_monitor
,
+ callback_tensorboard
,
+ callback_terminate_on_naan
Callback that prints metrics to stdout.
+ + +callback_progbar_logger(count_mode = "samples")+ +
count_mode | +One of "steps" or "samples". Whether the progress bar +should count samples seens or steps (batches) seen. |
+
---|
Other callbacks: callback_csv_logger
,
+ callback_early_stopping
,
+ callback_lambda
,
+ callback_learning_rate_scheduler
,
+ callback_model_checkpoint
,
+ callback_reduce_lr_on_plateau
,
+ callback_remote_monitor
,
+ callback_tensorboard
,
+ callback_terminate_on_naan
Models often benefit from reducing the learning rate by a factor of 2-10 once +learning stagnates. This callback monitors a quantity and if no improvement +is seen for a 'patience' number of epochs, the learning rate is reduced.
+ + +callback_reduce_lr_on_plateau(monitor = "val_loss", factor = 0.1, + patience = 10, verbose = 0, mode = c("auto", "min", "max"), + epsilon = 1e-04, cooldown = 0, min_lr = 0)+ +
monitor | +quantity to be monitored. |
+
---|---|
factor | +factor by which the learning rate will be reduced. new_lr = lr
|
+
patience | +number of epochs with no improvement after which learning +rate will be reduced. |
+
verbose | +int. 0: quiet, 1: update messages. |
+
mode | +one of "auto", "min", "max". In min mode, lr will be reduced when +the quantity monitored has stopped decreasing; in max mode it will be +reduced when the quantity monitored has stopped increasing; in auto mode, +the direction is automatically inferred from the name of the monitored +quantity. |
+
epsilon | +threshold for measuring the new optimum, to only focus on +significant changes. |
+
cooldown | +number of epochs to wait before resuming normal operation +after lr has been reduced. |
+
min_lr | +lower bound on the learning rate. |
+
Other callbacks: callback_csv_logger
,
+ callback_early_stopping
,
+ callback_lambda
,
+ callback_learning_rate_scheduler
,
+ callback_model_checkpoint
,
+ callback_progbar_logger
,
+ callback_remote_monitor
,
+ callback_tensorboard
,
+ callback_terminate_on_naan
Callback used to stream events to a server.
+ + +callback_remote_monitor(root = "http://localhost:9000", + path = "/publish/epoch/end/", field = "data", headers = NULL)+ +
root | +root url of the target server. |
+
---|---|
path | +path relative to root to which the events will be sent. |
+
field | +JSON field under which the data will be stored. |
+
headers | +Optional named list of custom HTTP headers. Defaults to:
+ |
+
Other callbacks: callback_csv_logger
,
+ callback_early_stopping
,
+ callback_lambda
,
+ callback_learning_rate_scheduler
,
+ callback_model_checkpoint
,
+ callback_progbar_logger
,
+ callback_reduce_lr_on_plateau
,
+ callback_tensorboard
,
+ callback_terminate_on_naan
This callback writes a log for TensorBoard, which allows you to visualize +dynamic graphs of your training and test metrics, as well as activation +histograms for the different layers in your model.
+ + +callback_tensorboard(log_dir = NULL, histogram_freq = 0, batch_size = 32, + write_graph = TRUE, write_grads = FALSE, write_images = FALSE, + embeddings_freq = 0, embeddings_layer_names = NULL, + embeddings_metadata = NULL)+ +
log_dir | +The path of the directory where to save the log files to be
+parsed by Tensorboard. The default is |
+
---|---|
histogram_freq | +frequency (in epochs) at which to compute activation +histograms for the layers of the model. If set to 0, histograms won't be +computed. |
+
batch_size | +size of batch of inputs to feed to the network +for histograms computation. |
+
write_graph | +whether to visualize the graph in Tensorboard. The log
+file can become quite large when write_graph is set to |
+
write_grads | +whether to visualize gradient histograms in TensorBoard.
+ |
+
write_images | +whether to write model weights to visualize as image in +Tensorboard. |
+
embeddings_freq | +frequency (in epochs) at which selected embedding +layers will be saved. |
+
embeddings_layer_names | +a list of names of layers to keep eye on. If
+ |
+
embeddings_metadata | +a named list which maps layer name to a file name in +which metadata for this embedding layer is saved. See the +details +about the metadata file format. In case if the same metadata file is used +for all embedding layers, string can be passed. |
+
TensorBoard is a visualization tool provided with TensorFlow.
+You can find more information about TensorBoard +here.
+ +Other callbacks: callback_csv_logger
,
+ callback_early_stopping
,
+ callback_lambda
,
+ callback_learning_rate_scheduler
,
+ callback_model_checkpoint
,
+ callback_progbar_logger
,
+ callback_reduce_lr_on_plateau
,
+ callback_remote_monitor
,
+ callback_terminate_on_naan
Callback that terminates training when a NaN loss is encountered.
+ + +callback_terminate_on_naan()
+
+ Other callbacks: callback_csv_logger
,
+ callback_early_stopping
,
+ callback_lambda
,
+ callback_learning_rate_scheduler
,
+ callback_model_checkpoint
,
+ callback_progbar_logger
,
+ callback_reduce_lr_on_plateau
,
+ callback_remote_monitor
,
+ callback_tensorboard
Configure a Keras model for training
+ + +compile(object, optimizer, loss, metrics = NULL, loss_weights = NULL, + sample_weight_mode = NULL, ...)+ +
object | +Model object to compile. |
+
---|---|
optimizer | +Name of optimizer or optimizer object. |
+
loss | +Name of objective function or objective function. If the model +has multiple outputs, you can use a different loss on each output by +passing a dictionary or a list of objectives. The loss value that will be +minimized by the model will then be the sum of all individual losses. |
+
metrics | +List of metrics to be evaluated by the model during training
+and testing. Typically you will use |
+
loss_weights | +Optional list specifying scalar coefficients to weight
+the loss contributions of different model outputs. The loss value that will
+be minimized by the model will then be the weighted sum of all indvidual
+losses, weighted by the |
+
sample_weight_mode | +If you need to do timestep-wise sample weighting
+(2D weights), set this to "temporal". |
+
... | +Additional named arguments passed to |
+
Other model functions: evaluate_generator
,
+ evaluate
, fit_generator
,
+ fit
, get_config
,
+ get_layer
,
+ keras_model_sequential
,
+ keras_model
, pop_layer
,
+ predict.keras.engine.training.Model
,
+ predict_generator
,
+ predict_on_batch
,
+ predict_proba
,
+ summary.keras.engine.training.Model
,
+ train_on_batch
Constrains the weights incident to each hidden unit to have a norm less than +or equal to a desired value.
+ + +constraint_maxnorm(max_value = 2, axis = 0)+ +
max_value | +The maximum norm for the incoming weights. |
+
---|---|
axis | +The axis along which to calculate weight norms. For instance, in
+a dense layer the weight matrix has shape |
+
Dropout: A Simple Way to Prevent Neural Networks from Overfitting Srivastava, Hinton, et al. 2014
+Other constraints: constraint_minmaxnorm
,
+ constraint_nonneg
,
+ constraint_unitnorm
Constrains the weights incident to each hidden unit to have the norm between +a lower bound and an upper bound.
+ + +constraint_minmaxnorm(min_value = 0, max_value = 1, rate = 1, axis = 0)+ +
min_value | +The minimum norm for the incoming weights. |
+
---|---|
max_value | +The maximum norm for the incoming weights. |
+
rate | +The rate for enforcing the constraint: weights will be rescaled to +yield (1 - rate) * norm + rate * norm.clip(low, high). Effectively, this +means that rate=1.0 stands for strict enforcement of the constraint, while +rate<1.0 means that weights will be rescaled at each step to slowly move +towards a value inside the desired interval. |
+
axis | +The axis along which to calculate weight norms. For instance, in
+a dense layer the weight matrix has shape |
+
Other constraints: constraint_maxnorm
,
+ constraint_nonneg
,
+ constraint_unitnorm
Constrains the weights to be non-negative.
+ + +constraint_nonneg()
+
+ Other constraints: constraint_maxnorm
,
+ constraint_minmaxnorm
,
+ constraint_unitnorm
Constrains the weights incident to each hidden unit to have unit norm.
+ + +constraint_unitnorm(axis = 0)+ +
axis | +The axis along which to calculate weight norms. For instance, in
+a dense layer the weight matrix has shape |
+
---|
Other constraints: constraint_maxnorm
,
+ constraint_minmaxnorm
,
+ constraint_nonneg
Count the total number of scalars composing the weights.
+ + +count_params(object)+ +
object | +Layer or model object |
+
---|
An integer count
+ +Other layer methods: get_config
,
+ get_input_at
, get_weights
,
+ reset_states
Create a Keras Layer
+ + +create_layer(layer_class, object, args = list())+ +
layer_class | +Python layer class or R6 class of type KerasLayer |
+
---|---|
object | +Object to compose layer with. This is either a
+ |
+
args | +List of arguments to layer constructor function |
+
A Keras layer
+ +The object
parameter can be missing, in which case the
+layer is created without a connection to an existing graph.
Dataset taken from the StatLib library which is maintained at Carnegie Mellon +University.
+ + +dataset_boston_housing(path = "boston_housing.npz", seed = 113L, + test_split = 0.2)+ +
path | +Path where to cache the dataset locally (relative to +~/.keras/datasets). |
+
---|---|
seed | +Random seed for shuffling the data before computing the test +split. |
+
test_split | +fraction of the data to reserve as test set. |
+
Lists of training and test data: train$x, train$y, test$x, test$y
.
Samples contain 13 attributes of houses at different locations around +the Boston suburbs in the late 1970s. Targets are the median values of the +houses at a location (in k$).
+ +Other datasets: dataset_cifar100
,
+ dataset_cifar10
,
+ dataset_imdb
, dataset_mnist
,
+ dataset_reuters
Dataset of 50,000 32x32 color training images, labeled over 10 categories, +and 10,000 test images.
+ + +dataset_cifar10()
+
+ Lists of training and test data: train$x, train$y, test$x, test$y
.
The x
data is an array of RGB image data with shape (num_samples, 3, 32,
+32).
The y
data is an array of category labels (integers in range 0-9) with
+shape (num_samples).
Other datasets: dataset_boston_housing
,
+ dataset_cifar100
,
+ dataset_imdb
, dataset_mnist
,
+ dataset_reuters
Dataset of 50,000 32x32 color training images, labeled over 100 categories, +and 10,000 test images.
+ + +dataset_cifar100(label_mode = c("fine", "coarse"))+ +
label_mode | +one of "fine", "coarse". |
+
---|
Lists of training and test data: train$x, train$y, test$x, test$y
.
The x
data is an array of RGB image data with shape (num_samples, 3, 32, 32).
The y
data is an array of category labels with shape (num_samples).
Other datasets: dataset_boston_housing
,
+ dataset_cifar10
,
+ dataset_imdb
, dataset_mnist
,
+ dataset_reuters
Dataset of 25,000 movies reviews from IMDB, labeled by sentiment +(positive/negative). Reviews have been preprocessed, and each review is +encoded as a sequence of word indexes (integers). For convenience, words are +indexed by overall frequency in the dataset, so that for instance the integer +"3" encodes the 3rd most frequent word in the data. This allows for quick +filtering operations such as: "only consider the top 10,000 most common +words, but eliminate the top 20 most common words".
+ + +dataset_imdb(path = "imdb.npz", num_words = NULL, skip_top = 0L, + maxlen = NULL, seed = 113L, start_char = 1L, oov_char = 2L, + index_from = 3L)+ +
path | +Where to cache the data (relative to |
+
---|---|
num_words | +Max number of words to include. Words are ranked by how +often they occur (in the training set) and only the most frequent words are +kept |
+
skip_top | +Skip the top N most frequently occuring words (which may not +be informative). |
+
maxlen | +Truncate sequences after this length. |
+
seed | +random seed for sample shuffling. |
+
start_char | +The start of a sequence will be marked with this character. +Set to 1 because 0 is usually the padding character. |
+
oov_char | +Words that were cut out because of the |
+
index_from | +Index actual words with this index and higher. |
+
Lists of training and test data: train$x, train$y, test$x, test$y
.
The x
data includes integer sequences. If the num_words`` argument was specific, the maximum possible index value is
num_words-1. If the
maxlen``
+argument was specified, the largest possible sequence length is maxlen
.
The y
data includes a set of integer labels (0 or 1).
As a convention, "0" does not stand for a specific word, but instead is used +to encode any unknown word.
+ +Other datasets: dataset_boston_housing
,
+ dataset_cifar100
,
+ dataset_cifar10
,
+ dataset_mnist
,
+ dataset_reuters
Dataset of 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images.
+ + +dataset_mnist(path = "mnist.npz")+ +
path | +Path where to cache the dataset locally (relative to ~/.keras/datasets). |
+
---|
Lists of training and test data: train$x, train$y, test$x, test$y
, where
+x
is an array of grayscale image data with shape (num_samples, 28, 28) and y
+is an array of digit labels (integers in range 0-9) with shape (num_samples).
Other datasets: dataset_boston_housing
,
+ dataset_cifar100
,
+ dataset_cifar10
,
+ dataset_imdb
, dataset_reuters
Dataset of 11,228 newswires from Reuters, labeled over 46 topics. As with
+dataset_imdb()
, each wire is encoded as a sequence of word indexes (same
+conventions).
dataset_reuters(path = "reuters.npz", num_words = NULL, skip_top = 0L, + maxlen = NULL, test_split = 0.2, seed = 113L, start_char = 1L, + oov_char = 2L, index_from = 3L) + +dataset_reuters_word_index(path = "reuters_word_index.pkl")+ +
path | +Where to cache the data (relative to |
+
---|---|
num_words | +Max number of words to include. Words are ranked by how +often they occur (in the training set) and only the most frequent words are +kept |
+
skip_top | +Skip the top N most frequently occuring words (which may not +be informative). |
+
maxlen | +Truncate sequences after this length. |
+
test_split | +Fraction of the dataset to be used as test data. |
+
seed | +Random seed for sample shuffling. |
+
start_char | +The start of a sequence will be marked with this character. +Set to 1 because 0 is usually the padding character. |
+
oov_char | +words that were cut out because of the |
+
index_from | +index actual words with this index and higher. |
+
Lists of training and test data: train$x, train$y, test$x, test$y
+with same format as dataset_imdb()
. The dataset_reuters_word_index()
+function returns a list where the names are words and the values are
+integer. e.g. word_index[["giraffe"]]
might return 1234
.
[["giraffe"]: R:[
+ +Other datasets: dataset_boston_housing
,
+ dataset_cifar100
,
+ dataset_cifar10
,
+ dataset_imdb
, dataset_mnist
Evaluate a Keras model
+ + +evaluate(object, x, y, batch_size = 32, verbose = 1, sample_weight = NULL)+ +
object | +Model object to evaluate |
+
---|---|
x | +Vector, matrix, or array of training data (or list if the model has +multiple inputs). If all inputs in the model are named, you can also pass a +list mapping input names to data. |
+
y | +Vector, matrix, or array of target data (or list if the model has +multiple outputs). If all outputs in the model are named, you can also pass +a list mapping output names to data. |
+
batch_size | +Number of samples per gradient update. |
+
verbose | +Verbosity mode (0 = silent, 1 = verbose, 2 = one log line per +epoch). |
+
sample_weight | +Optional array of the same length as x, containing
+weights to apply to the model's loss for each sample. In the case of
+temporal data, you can pass a 2D array with shape (samples,
+sequence_length), to apply a different weight to every timestep of every
+sample. In this case you should make sure to specify
+sample_weight_mode="temporal" in |
+
Named list of model test loss (or losses for models with multiple outputs) +and model metrics.
+ +Other model functions: compile
,
+ evaluate_generator
,
+ fit_generator
, fit
,
+ get_config
, get_layer
,
+ keras_model_sequential
,
+ keras_model
, pop_layer
,
+ predict.keras.engine.training.Model
,
+ predict_generator
,
+ predict_on_batch
,
+ predict_proba
,
+ summary.keras.engine.training.Model
,
+ train_on_batch
The generator should return the same kind of data as accepted by
+test_on_batch()
.
evaluate_generator(object, generator, steps, max_queue_size = 10)+ +
object | +Model object to evaluate |
+
---|---|
generator | +Generator yielding lists (inputs, targets) or (inputs, +targets, sample_weights) |
+
steps | +Total number of steps (batches of samples) to yield from
+ |
+
max_queue_size | +maximum size for the generator queue |
+
Named list of model test loss (or losses for models with multiple outputs) +and model metrics.
+ +Other model functions: compile
,
+ evaluate
, fit_generator
,
+ fit
, get_config
,
+ get_layer
,
+ keras_model_sequential
,
+ keras_model
, pop_layer
,
+ predict.keras.engine.training.Model
,
+ predict_generator
,
+ predict_on_batch
,
+ predict_proba
,
+ summary.keras.engine.training.Model
,
+ train_on_batch
Trains the model for a fixed number of epochs (iterations on a dataset).
+ + +fit(object, x, y, batch_size = 32, epochs = 10, verbose = 1, + callbacks = NULL, view_metrics = getOption("keras.view_metrics", default = + "auto"), validation_split = 0, validation_data = NULL, shuffle = TRUE, + class_weight = NULL, sample_weight = NULL, initial_epoch = 0, ...)+ +
object | +Model to train. |
+
---|---|
x | +Vector, matrix, or array of training data (or list if the model has +multiple inputs). If all inputs in the model are named, you can also pass a +list mapping input names to data. |
+
y | +Vector, matrix, or array of target data (or list if the model has +multiple outputs). If all outputs in the model are named, you can also pass +a list mapping output names to data. |
+
batch_size | +Number of samples per gradient update. |
+
epochs | +Number of times to iterate over the training data arrays. |
+
verbose | +Verbosity mode (0 = silent, 1 = verbose, 2 = one log line per +epoch). |
+
callbacks | +List of callbacks to be called during training. |
+
view_metrics | +View realtime plot of training metrics (by epoch). The
+default ( |
+
validation_split | +Float between 0 and 1: fraction of the training data +to be used as validation data. The model will set apart this fraction of +the training data, will not train on it, and will evaluate the loss and any +model metrics on this data at the end of each epoch. |
+
validation_data | +Data on which to evaluate the loss and any model +metrics at the end of each epoch. The model will not be trained on this +data. This could be a list (x_val, y_val) or a list (x_val, y_val, +val_sample_weights). |
+
shuffle | +
|
+
class_weight | +Optional named list mapping indices (integers) to a +weight (float) to apply to the model's loss for the samples from this class +during training. This can be useful to tell the model to "pay more +attention" to samples from an under-represented class. |
+
sample_weight | +Optional array of the same length as x, containing
+weights to apply to the model's loss for each sample. In the case of
+temporal data, you can pass a 2D array with shape (samples,
+sequence_length), to apply a different weight to every timestep of every
+sample. In this case you should make sure to specify
+sample_weight_mode="temporal" in |
+
initial_epoch | +epoch at which to start training (useful for resuming a +previous training run). |
+
... | +Unused |
+
Other model functions: compile
,
+ evaluate_generator
, evaluate
,
+ fit_generator
, get_config
,
+ get_layer
,
+ keras_model_sequential
,
+ keras_model
, pop_layer
,
+ predict.keras.engine.training.Model
,
+ predict_generator
,
+ predict_on_batch
,
+ predict_proba
,
+ summary.keras.engine.training.Model
,
+ train_on_batch
The generator is run in parallel to the model, for efficiency. For instance, +this allows you to do real-time data augmentation on images on CPU in +parallel to training your model on GPU.
+ + +fit_generator(object, generator, steps_per_epoch, epochs = 1, verbose = 1, + callbacks = NULL, view_metrics = getOption("keras.view_metrics", default = + "auto"), validation_data = NULL, validation_steps = NULL, + class_weight = NULL, max_queue_size = 10, initial_epoch = 0)+ +
object | +Keras model object |
+
---|---|
generator | +A generator (e.g. like the one provided by
+ The output of the generator must be a list of one of these forms: - (inputs, targets) + - (inputs, targets, sample_weights) ++ + Note that the generator should call the All arrays should contain the same number of samples. The generator is expected
+to loop over its data indefinitely. An epoch finishes when |
+
steps_per_epoch | +Total number of steps (batches of samples) to yield
+from |
+
epochs | +integer, total number of iterations on the data. |
+
verbose | +Verbosity mode (0 = silent, 1 = verbose, 2 = one log line per +epoch). |
+
callbacks | +list of callbacks to be called during training. |
+
view_metrics | +View realtime plot of training metrics (by epoch). The
+default ( |
+
validation_data | +this can be either:
|
+
validation_steps | +Only relevant if |
+
class_weight | +dictionary mapping class indices to a weight for the +class. |
+
max_queue_size | +maximum size for the generator queue |
+
initial_epoch | +epoch at which to start training (useful for resuming a +previous training run) |
+
Training history object (invisibly)
+ +Other model functions: compile
,
+ evaluate_generator
, evaluate
,
+ fit
, get_config
,
+ get_layer
,
+ keras_model_sequential
,
+ keras_model
, pop_layer
,
+ predict.keras.engine.training.Model
,
+ predict_generator
,
+ predict_on_batch
,
+ predict_proba
,
+ summary.keras.engine.training.Model
,
+ train_on_batch
Required for featurewise_center
, featurewise_std_normalization
+and zca_whitening
.
fit_image_data_generator(object, x, augment = FALSE, rounds = 1, + seed = NULL, ...)+ +
object | ++ |
---|---|
x | +array, the data to fit on (should have rank 4). In case of grayscale data, +the channels axis should have value 1, and in case of RGB data, it should have value 3. |
+
augment | +Whether to fit on randomly augmented samples |
+
rounds | +If |
+
seed | +random seed. |
+
... | +Unused |
+
Other image preprocessing: flow_images_from_data
,
+ flow_images_from_directory
,
+ image_load
, image_to_array
Update tokenizer internal vocabulary based on a list of texts or list of +sequences.
+ + +fit_text_tokenizer(object, x, ...)+ +
object | +Tokenizer returned by |
+
---|---|
x | +Vector/list of strings, or a generator of strings (for +memory-efficiency); Alternatively a list of "sequence" (a sequence is a +list of integer word indices). |
+
... | +Unused |
+
Required before using texts_to_sequences()
, texts_to_matrix()
, or
+sequences_to_matrix()
.
Other text tokenization: sequences_to_matrix
,
+ text_tokenizer
,
+ texts_to_matrix
,
+ texts_to_sequences_generator
,
+ texts_to_sequences
Generates batches of augmented/normalized data from image data and labels
+ + +flow_images_from_data(x, y = NULL, generator = image_data_generator(), + batch_size = 32, shuffle = TRUE, seed = NULL, save_to_dir = NULL, + save_prefix = "", save_format = "png")+ +
x | +data. Should have rank 4. In case of grayscale data, the channels +axis should have value 1, and in case of RGB data, it should have value 3. |
+
---|---|
y | +labels (can be |
+
generator | +Image data generator to use for augmenting/normalizing image +data. |
+
batch_size | +int (default: |
+
shuffle | +boolean (defaut: |
+
seed | +int (default: |
+
save_to_dir | +
|
+
save_prefix | +str (default: ''). Prefix to use for filenames of saved
+pictures (only relevant if |
+
save_format | +one of "png", "jpeg" (only relevant if save_to_dir is +set). Default: "png". |
+
Yields batches indefinitely, in an infinite loop.
+ +(x, y)
where x
is an array of image data and y
is a
+array of corresponding labels. The generator loops indefinitely.
Other image preprocessing: fit_image_data_generator
,
+ flow_images_from_directory
,
+ image_load
, image_to_array
Generates batches of data from images in a directory (with optional +augmented/normalized data)
+ + +flow_images_from_directory(directory, generator = image_data_generator(), + target_size = c(256, 256), color_mode = "rgb", classes = NULL, + class_mode = "categorical", batch_size = 32, shuffle = TRUE, + seed = NULL, save_to_dir = NULL, save_prefix = "", + save_format = "png", follow_links = FALSE)+ +
directory | +path to the target directory. It should contain one +subdirectory per class. Any PNG, JPG or BMP images inside each of the +subdirectories directory tree will be included in the generator. See thisscript +for more details. |
+
---|---|
generator | +Image data generator (default generator does no data +augmentation/normalization transformations) |
+
target_size | +integer vectir, default: |
+
color_mode | +one of "grayscale", "rbg". Default: "rgb". Whether the +images will be converted to have 1 or 3 color channels. |
+
classes | +optional list of class subdirectories (e.g. |
+
class_mode | +one of "categorical", "binary", "sparse" or |
+
batch_size | +int (default: |
+
shuffle | +boolean (defaut: |
+
seed | +int (default: |
+
save_to_dir | +
|
+
save_prefix | +str (default: ''). Prefix to use for filenames of saved
+pictures (only relevant if |
+
save_format | +one of "png", "jpeg" (only relevant if save_to_dir is +set). Default: "png". |
+
follow_links | +whether to follow symlinks inside class subdirectories
+(default: |
+
Yields batches indefinitely, in an infinite loop.
+ +(x, y)
where x
is an array of image data and y
is a
+array of corresponding labels. The generator loops indefinitely.
Other image preprocessing: fit_image_data_generator
,
+ flow_images_from_data
,
+ image_load
, image_to_array
A layer config is an object returned from get_config()
that contains the
+configuration of a layer or model. The same layer or model can be
+reinstantiated later (without its trained weights) from this configuration
+using from_config()
. The config does not include connectivity information,
+nor the class name (those are handled externally).
get_config(object) + +from_config(config)+ +
object | +Layer or model object |
+
---|---|
config | +Object with layer or model configuration |
+
get_config()
returns an object with the configuration,
+from_config()
returns a re-instantation of hte object.
Objects returned from get_config()
are not serializable. Therefore,
+if you want to save and restore a model across sessions, you can use the
+model_to_json()
or model_to_yaml()
functions (for model configuration
+only, not weights) or the save_model_hdf5()
function to save the model
+configuration and weights to a file.
Other model functions: compile
,
+ evaluate_generator
, evaluate
,
+ fit_generator
, fit
,
+ get_layer
,
+ keras_model_sequential
,
+ keras_model
, pop_layer
,
+ predict.keras.engine.training.Model
,
+ predict_generator
,
+ predict_on_batch
,
+ predict_proba
,
+ summary.keras.engine.training.Model
,
+ train_on_batch
Other layer methods: count_params
,
+ get_input_at
, get_weights
,
+ reset_states
Passing the MD5 hash will verify the file after download as well as if it is +already present in the cache.
+ + +get_file(fname, origin, file_hash = NULL, cache_subdir = "datasets", + hash_algorithm = "auto", extract = FALSE, archive_format = "auto", + cache_dir = NULL)+ +
fname | +Name of the file. If an absolute path |
+
---|---|
origin | +Original URL of the file. |
+
file_hash | +The expected hash string of the file after download. The +sha256 and md5 hash algorithms are both supported. |
+
cache_subdir | +Subdirectory under the Keras cache dir where the file is
+saved. If an absolute path |
+
hash_algorithm | +Select the hash algorithm to verify the file. options +are 'md5', 'sha256', and 'auto'. The default 'auto' detects the hash +algorithm in use. |
+
extract | +True tries extracting the file as an Archive, like tar or zip. |
+
archive_format | +Archive format to try for extracting the file. Options +are 'auto', 'tar', 'zip', and None. 'tar' includes tar, tar.gz, and tar.bz +files. The default 'auto' is ('tar', 'zip'). None or an empty list will +return no matches found. |
+
cache_dir | +Location to store cached files, when |
+
Path to the downloaded file
+ + +Whenever you are calling a layer on some input, you are creating a new tensor +(the output of the layer), and you are adding a "node" to the layer, linking +the input tensor to the output tensor. When you are calling the same layer +multiple times, that layer owns multiple nodes indexed as 1, 2, 3. These +functions enable you to retreive various tensor properties of layers with +multiple nodes.
+ + +get_input_at(object, node_index) + +get_output_at(object, node_index) + +get_input_shape_at(object, node_index) + +get_output_shape_at(object, node_index) + +get_input_mask_at(object, node_index) + +get_output_mask_at(object, node_index)+ +
object | +Layer or model object |
+
---|---|
node_index | +Integer, index of the node from which to retrieve the
+attribute. E.g. |
+
A tensor (or list of tensors if the layer has multiple inputs/outputs).
+ +Other layer methods: count_params
,
+ get_config
, get_weights
,
+ reset_states
Indices are based on order of horizontal graph traversal (bottom-up) and +are 0-based.
+ + +get_layer(object, name = NULL, index = NULL)+ +
object | +Keras model object |
+
---|---|
name | +String, name of layer. |
+
index | +Integer, index of layer (0-based) |
+
A layer instance.
+ +Other model functions: compile
,
+ evaluate_generator
, evaluate
,
+ fit_generator
, fit
,
+ get_config
,
+ keras_model_sequential
,
+ keras_model
, pop_layer
,
+ predict.keras.engine.training.Model
,
+ predict_generator
,
+ predict_on_batch
,
+ predict_proba
,
+ summary.keras.engine.training.Model
,
+ train_on_batch
Layer/Model weights as R arrays
+ + +get_weights(object) + +set_weights(object, weights)+ +
object | +Layer or model object |
+
---|---|
weights | +Weights as R array |
+
Other model persistence: model_to_json
,
+ model_to_yaml
,
+ save_model_hdf5
,
+ save_model_weights_hdf5
,
+ serialize_model
Other layer methods: count_params
,
+ get_config
, get_input_at
,
+ reset_states
Representation of HDF5 dataset to be used instead of an R array
+ + +hdf5_matrix(datapath, dataset, start = 0, end = NULL, normalizer = NULL)+ +
datapath | +string, path to a HDF5 file |
+
---|---|
dataset | +string, name of the HDF5 dataset in the file specified in datapath |
+
start | +int, start of desired slice of the specified dataset |
+
end | +int, end of desired slice of the specified dataset |
+
normalizer | +function to be called on data when retrieved |
+
An array-like HDF5 dataset.
+ +Providing start
and end
allows use of a slice of the dataset.
Optionally, a normalizer function (or lambda) can be given. This will +be called on every slice of data retrieved.
+ + +Generate minibatches of image data with real-time data augmentation.
+ + +image_data_generator(featurewise_center = FALSE, samplewise_center = FALSE, + featurewise_std_normalization = FALSE, + samplewise_std_normalization = FALSE, zca_whitening = FALSE, + zca_epsilon = 1e-06, rotation_range = 0, width_shift_range = 0, + height_shift_range = 0, shear_range = 0, zoom_range = 0, + channel_shift_range = 0, fill_mode = "nearest", cval = 0, + horizontal_flip = FALSE, vertical_flip = FALSE, rescale = NULL, + preprocessing_function = NULL, data_format = NULL)+ +
featurewise_center | +set input mean to 0 over the dataset. |
+
---|---|
samplewise_center | +set each sample mean to 0. |
+
featurewise_std_normalization | +divide inputs by std of the dataset. |
+
samplewise_std_normalization | +divide each input by its std. |
+
zca_whitening | +apply ZCA whitening. |
+
zca_epsilon | +Epsilon for ZCA whitening. Default is 1e-6. |
+
rotation_range | +degrees (0 to 180). |
+
width_shift_range | +fraction of total width. |
+
height_shift_range | +fraction of total height. |
+
shear_range | +shear intensity (shear angle in radians). |
+
zoom_range | +amount of zoom. if scalar z, zoom will be randomly picked
+in the range |
+
channel_shift_range | +shift range for each channels. |
+
fill_mode | +points outside the boundaries are filled according to the +given mode ('constant', 'nearest', 'reflect' or 'wrap'). Default is +'nearest'. |
+
cval | +value used for points outside the boundaries when fill_mode is +'constant'. Default is 0. |
+
horizontal_flip | +whether to randomly flip images horizontally. |
+
vertical_flip | +whether to randomly flip images vertically. |
+
rescale | +rescaling factor. If NULL or 0, no rescaling is applied, +otherwise we multiply the data by the value provided (before applying any +other transformation). |
+
preprocessing_function | +function that will be implied on each input. +The function will run before any other modification on it. The function +should take one argument: one image (tensor with rank 3), and should +output a tensor with the same shape. |
+
data_format | +'channels_first' or 'channels_last'. In 'channels_first'
+mode, the channels dimension (the depth) is at index 1, in 'channels_last'
+mode it is at index 3. It defaults to the |
+
Loads an image into PIL format.
+ + +image_load(path, grayscale = FALSE, target_size = NULL)+ +
path | +Path to image file |
+
---|---|
grayscale | +Boolean, whether to load the image as grayscale. |
+
target_size | +Either |
+
A PIL Image instance.
+ +Other image preprocessing: fit_image_data_generator
,
+ flow_images_from_data
,
+ flow_images_from_directory
,
+ image_to_array
Converts a PIL Image instance to a 3d-array.
+ + +image_to_array(img, data_format = c("channels_last", "channels_first"))+ +
img | +PIL Image instance. |
+
---|---|
data_format | +Image data format ("channels_last" or "channels_first") |
+
A 3D array.
+ +Other image preprocessing: fit_image_data_generator
,
+ flow_images_from_data
,
+ flow_images_from_directory
,
+ image_load
Decodes the prediction of an ImageNet model.
+ + +imagenet_decode_predictions(preds, top = 5)+ +
preds | +Tensor encoding a batch of predictions. |
+
---|---|
top | +integer, how many top-guesses to return. |
+
List of data frames with variables class_name
, class_description
,
+and score
(one data frame per sample in batch input).
Obtain a reference to the Python module used for the implementation of Keras.
+ + +implementation()
+
+ Reference to the Python module used for the implementation of Keras.
+ +There are currently two Python modules which implement Keras:
keras ("keras")
tensorflow.contrib.keras ("tensorflow")
This function returns a reference to the implementation being currently
+used by the keras package. The default implementation is "keras".
+You can override this by setting the KERAS_IMPLEMENTATION
environment
+variable to "tensorflow".
Initializer that generates tensors initialized to a constant value.
+ + +initializer_constant(value = 0)+ +
value | +float; the value of the generator tensors. |
+
---|
Other initializers: initializer_glorot_normal
,
+ initializer_glorot_uniform
,
+ initializer_he_normal
,
+ initializer_he_uniform
,
+ initializer_identity
,
+ initializer_lecun_normal
,
+ initializer_lecun_uniform
,
+ initializer_ones
,
+ initializer_orthogonal
,
+ initializer_random_normal
,
+ initializer_random_uniform
,
+ initializer_truncated_normal
,
+ initializer_variance_scaling
,
+ initializer_zeros
It draws samples from a truncated normal distribution centered on 0
+with stddev = sqrt(2 / (fan_in + fan_out))
+where fan_in
is the number of input units in the weight tensor
+and fan_out
is the number of output units in the weight tensor.
initializer_glorot_normal(seed = NULL)+ +
seed | +Integer used to seed the random generator. |
+
---|
Glorot & Bengio, AISTATS 2010 http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf
+ +Other initializers: initializer_constant
,
+ initializer_glorot_uniform
,
+ initializer_he_normal
,
+ initializer_he_uniform
,
+ initializer_identity
,
+ initializer_lecun_normal
,
+ initializer_lecun_uniform
,
+ initializer_ones
,
+ initializer_orthogonal
,
+ initializer_random_normal
,
+ initializer_random_uniform
,
+ initializer_truncated_normal
,
+ initializer_variance_scaling
,
+ initializer_zeros
It draws samples from a uniform distribution within -limit, limit
+where limit
is sqrt(6 / (fan_in + fan_out))
+where fan_in
is the number of input units in the weight tensor
+and fan_out
is the number of output units in the weight tensor.
initializer_glorot_uniform(seed = NULL)+ +
seed | +Integer used to seed the random generator. |
+
---|
Glorot & Bengio, AISTATS 2010 http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf
+ +Other initializers: initializer_constant
,
+ initializer_glorot_normal
,
+ initializer_he_normal
,
+ initializer_he_uniform
,
+ initializer_identity
,
+ initializer_lecun_normal
,
+ initializer_lecun_uniform
,
+ initializer_ones
,
+ initializer_orthogonal
,
+ initializer_random_normal
,
+ initializer_random_uniform
,
+ initializer_truncated_normal
,
+ initializer_variance_scaling
,
+ initializer_zeros
It draws samples from a truncated normal distribution centered on 0 with
+stddev = sqrt(2 / fan_in)
where fan_in
is the number of input units in
+the weight tensor.
initializer_he_normal(seed = NULL)+ +
seed | +Integer used to seed the random generator. |
+
---|
He et al., http://arxiv.org/abs/1502.01852
+ +Other initializers: initializer_constant
,
+ initializer_glorot_normal
,
+ initializer_glorot_uniform
,
+ initializer_he_uniform
,
+ initializer_identity
,
+ initializer_lecun_normal
,
+ initializer_lecun_uniform
,
+ initializer_ones
,
+ initializer_orthogonal
,
+ initializer_random_normal
,
+ initializer_random_uniform
,
+ initializer_truncated_normal
,
+ initializer_variance_scaling
,
+ initializer_zeros
It draws samples from a uniform distribution within -limit, limit
where
+limit`` is
sqrt(6 / fan_in)where
fan_in` is the number of input units in the
+weight tensor.
initializer_he_uniform(seed = NULL)+ +
seed | +Integer used to seed the random generator. |
+
---|
He et al., http://arxiv.org/abs/1502.01852
+ +Other initializers: initializer_constant
,
+ initializer_glorot_normal
,
+ initializer_glorot_uniform
,
+ initializer_he_normal
,
+ initializer_identity
,
+ initializer_lecun_normal
,
+ initializer_lecun_uniform
,
+ initializer_ones
,
+ initializer_orthogonal
,
+ initializer_random_normal
,
+ initializer_random_uniform
,
+ initializer_truncated_normal
,
+ initializer_variance_scaling
,
+ initializer_zeros
Only use for square 2D matrices.
+ + +initializer_identity(gain = 1)+ +
gain | +Multiplicative factor to apply to the identity matrix |
+
---|
Other initializers: initializer_constant
,
+ initializer_glorot_normal
,
+ initializer_glorot_uniform
,
+ initializer_he_normal
,
+ initializer_he_uniform
,
+ initializer_lecun_normal
,
+ initializer_lecun_uniform
,
+ initializer_ones
,
+ initializer_orthogonal
,
+ initializer_random_normal
,
+ initializer_random_uniform
,
+ initializer_truncated_normal
,
+ initializer_variance_scaling
,
+ initializer_zeros
It draws samples from a truncated normal distribution centered on 0 with
+stddev <- sqrt(1 / fan_in)
where fan_in
is the number of input units in
+the weight tensor..
initializer_lecun_normal(seed = NULL)+ +
seed | +A Python integer. Used to seed the random generator. |
+
---|
Other initializers: initializer_constant
,
+ initializer_glorot_normal
,
+ initializer_glorot_uniform
,
+ initializer_he_normal
,
+ initializer_he_uniform
,
+ initializer_identity
,
+ initializer_lecun_uniform
,
+ initializer_ones
,
+ initializer_orthogonal
,
+ initializer_random_normal
,
+ initializer_random_uniform
,
+ initializer_truncated_normal
,
+ initializer_variance_scaling
,
+ initializer_zeros
It draws samples from a uniform distribution within -limit, limit
where
+limit
is sqrt(3 / fan_in)
where fan_in
is the number of input units in
+the weight tensor.
initializer_lecun_uniform(seed = NULL)+ +
seed | +Integer used to seed the random generator. |
+
---|
LeCun 98, Efficient Backprop, +http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
+ +Other initializers: initializer_constant
,
+ initializer_glorot_normal
,
+ initializer_glorot_uniform
,
+ initializer_he_normal
,
+ initializer_he_uniform
,
+ initializer_identity
,
+ initializer_lecun_normal
,
+ initializer_ones
,
+ initializer_orthogonal
,
+ initializer_random_normal
,
+ initializer_random_uniform
,
+ initializer_truncated_normal
,
+ initializer_variance_scaling
,
+ initializer_zeros
Initializer that generates tensors initialized to 1.
+ + +initializer_ones()
+
+ Other initializers: initializer_constant
,
+ initializer_glorot_normal
,
+ initializer_glorot_uniform
,
+ initializer_he_normal
,
+ initializer_he_uniform
,
+ initializer_identity
,
+ initializer_lecun_normal
,
+ initializer_lecun_uniform
,
+ initializer_orthogonal
,
+ initializer_random_normal
,
+ initializer_random_uniform
,
+ initializer_truncated_normal
,
+ initializer_variance_scaling
,
+ initializer_zeros
Initializer that generates a random orthogonal matrix.
+ + +initializer_orthogonal(gain = 1, seed = NULL)+ +
gain | +Multiplicative factor to apply to the orthogonal matrix. |
+
---|---|
seed | +Integer used to seed the random generator. |
+
Saxe et al., http://arxiv.org/abs/1312.6120
+ +Other initializers: initializer_constant
,
+ initializer_glorot_normal
,
+ initializer_glorot_uniform
,
+ initializer_he_normal
,
+ initializer_he_uniform
,
+ initializer_identity
,
+ initializer_lecun_normal
,
+ initializer_lecun_uniform
,
+ initializer_ones
,
+ initializer_random_normal
,
+ initializer_random_uniform
,
+ initializer_truncated_normal
,
+ initializer_variance_scaling
,
+ initializer_zeros
Initializer that generates tensors with a normal distribution.
+ + +initializer_random_normal(mean = 0, stddev = 0.05, seed = NULL)+ +
mean | +Mean of the random values to generate. |
+
---|---|
stddev | +Standard deviation of the random values to generate. |
+
seed | +Integer used to seed the random generator. |
+
Other initializers: initializer_constant
,
+ initializer_glorot_normal
,
+ initializer_glorot_uniform
,
+ initializer_he_normal
,
+ initializer_he_uniform
,
+ initializer_identity
,
+ initializer_lecun_normal
,
+ initializer_lecun_uniform
,
+ initializer_ones
,
+ initializer_orthogonal
,
+ initializer_random_uniform
,
+ initializer_truncated_normal
,
+ initializer_variance_scaling
,
+ initializer_zeros
Initializer that generates tensors with a uniform distribution.
+ + +initializer_random_uniform(minval = -0.05, maxval = 0.05, seed = NULL)+ +
minval | +Lower bound of the range of random values to generate. |
+
---|---|
maxval | +Upper bound of the range of random values to generate. Defaults to 1 for float types. |
+
seed | +seed |
+
Other initializers: initializer_constant
,
+ initializer_glorot_normal
,
+ initializer_glorot_uniform
,
+ initializer_he_normal
,
+ initializer_he_uniform
,
+ initializer_identity
,
+ initializer_lecun_normal
,
+ initializer_lecun_uniform
,
+ initializer_ones
,
+ initializer_orthogonal
,
+ initializer_random_normal
,
+ initializer_truncated_normal
,
+ initializer_variance_scaling
,
+ initializer_zeros
These values are similar to values from an initializer_random_normal()
+except that values more than two standard deviations from the mean
+are discarded and re-drawn. This is the recommended initializer for
+neural network weights and filters.
initializer_truncated_normal(mean = 0, stddev = 0.05, seed = NULL)+ +
mean | +Mean of the random values to generate. |
+
---|---|
stddev | +Standard deviation of the random values to generate. |
+
seed | +Integer used to seed the random generator. |
+
Other initializers: initializer_constant
,
+ initializer_glorot_normal
,
+ initializer_glorot_uniform
,
+ initializer_he_normal
,
+ initializer_he_uniform
,
+ initializer_identity
,
+ initializer_lecun_normal
,
+ initializer_lecun_uniform
,
+ initializer_ones
,
+ initializer_orthogonal
,
+ initializer_random_normal
,
+ initializer_random_uniform
,
+ initializer_variance_scaling
,
+ initializer_zeros
With distribution="normal"
, samples are drawn from a truncated normal
+distribution centered on zero, with stddev = sqrt(scale / n)
where n is:
number of input units in the weight tensor, if mode = "fan_in"
number of output units, if mode = "fan_out"
average of the numbers of input and output units, if mode = "fan_avg"
initializer_variance_scaling(scale = 1, mode = c("fan_in", "fan_out", + "fan_avg"), distribution = c("normal", "uniform"), seed = NULL)+ +
scale | +Scaling factor (positive float). |
+
---|---|
mode | +One of "fan_in", "fan_out", "fan_avg". |
+
distribution | +One of "normal", "uniform" |
+
seed | +Integer used to seed the random generator. |
+
With distribution="uniform"
, samples are drawn from a uniform distribution
+within -limit, limit
, with limit = sqrt(3 * scale / n)
.
Other initializers: initializer_constant
,
+ initializer_glorot_normal
,
+ initializer_glorot_uniform
,
+ initializer_he_normal
,
+ initializer_he_uniform
,
+ initializer_identity
,
+ initializer_lecun_normal
,
+ initializer_lecun_uniform
,
+ initializer_ones
,
+ initializer_orthogonal
,
+ initializer_random_normal
,
+ initializer_random_uniform
,
+ initializer_truncated_normal
,
+ initializer_zeros
Initializer that generates tensors initialized to 0.
+ + +initializer_zeros()
+
+ Other initializers: initializer_constant
,
+ initializer_glorot_normal
,
+ initializer_glorot_uniform
,
+ initializer_he_normal
,
+ initializer_he_uniform
,
+ initializer_identity
,
+ initializer_lecun_normal
,
+ initializer_lecun_uniform
,
+ initializer_ones
,
+ initializer_orthogonal
,
+ initializer_random_normal
,
+ initializer_random_uniform
,
+ initializer_truncated_normal
,
+ initializer_variance_scaling
Keras and TensorFlow will be installed into an "r-tensorflow" virtual or conda +environment. Note that "virtualenv" is not available on Windows (as this isn't +supported by TensorFlow).
+ + +install_keras(method = c("virtualenv", "conda"), conda = "auto", + tensorflow = "default", extra_packages = NULL)+ +
method | +Installation method ("virtualenv" or "conda") |
+
---|---|
conda | +Path to conda executable (or "auto" to find conda using the PATH +and other conventional install locations). |
+
tensorflow | +TensorFlow version to install. Specify "default" to install +the CPU version of the latest release. Specify "gpu" to install the GPU +version of the latest release. +You can also provide a full major.minor.patch specification (e.g. "1.1.0"), +appending "-gpu" if you want the GPU version (e.g. "1.1.0-gpu"). +Alternatively, you can provide the full URL to an installer binary (e.g. +for a nightly binary). |
+
extra_packages | +Additional PyPI packages to install along with +Keras and TensorFlow. |
+
Keras and TensorFlow can be configured to run on either CPUs or GPUs. The CPU +version is much easier to install and configure so is the best starting place +especially when you are first learning how to use Keras. Here's the guidance +on CPU vs. GPU versions from the TensorFlow website:
TensorFlow with CPU support only. If your system does not have a NVIDIA® GPU, +you must install this version. Note that this version of TensorFlow is typically +much easier to install, so even if you have an NVIDIA GPU, we recommend installing +this version first.
TensorFlow with GPU support. TensorFlow programs typically run significantly +faster on a GPU than on a CPU. Therefore, if your system has a NVIDIA® GPU meeting +all prerequisites and you need to run performance-critical applications, you should +ultimately install this version.
To install the GPU version:
Ensure that you have met all installation prerequisites including installation +of the CUDA and cuDNN libraries as described in TensorFlow GPU Prerequistes.
Pass tensorflow = "gpu"
to install_keras()
. For example:
install_keras(tensorflow = "gpu") +
The only supported installation method on Windows is "conda". This means that you +should install Anaconda 3.x for Windows prior to installing Keras.
+ +Installing Keras and TensorFlow using install_keras()
isn't required
+to use the Keras R package. You can do a custom installation of Keras (and
+desired backend) as described on the Keras website
+and the Keras R package will find and use that version.
See the documentation on custom installations +for additional information on how version of Keras and TensorFlow are located +by the Keras package.
+ +If you wish to add additional PyPI packages to your Keras / TensorFlow environment you
+can either specify the packages in the extra_packages
argument of install_keras()
,
+or alternatively install them into an existing environment using the
+install_tensorflow_extras()
function.
# NOT RUN { +# default installation +library(keras) +install_keras() + +# install using a conda environment (default is virtualenv) +install_keras(method = "conda") + +# install with GPU version of TensorFlow +# (NOTE: only do this if you have an NVIDIA GPU + CUDA!) +install_keras(tensorflow = "gpu") + +# install a specific version of TensorFlow +install_keras(tensorflow = "1.2.1") +install_keras(tensorflow = "1.2.1-gpu") + +# }++
Probe to see whether the Keras python package is available in the current +system environment.
+ + +is_keras_available(version = NULL)+ +
version | +Minimum required version of Keras (defaults to |
+
---|
Logical indicating whether Keras (or the specified minimum version of +Keras) is available.
+ + +# NOT RUN { +# testthat utilty for skipping tests when Keras isn't available +skip_if_no_keras <- function(version = NULL) { + if (!is_keras_available(version)) + skip("Required keras version not available for testing") +} + +# use the function within a test +test_that("keras function works correctly", { + skip_if_no_keras() + # test code here +}) +# }++
A model is a directed acyclic graph of layers.
+ + +keras_model(inputs, outputs = NULL)+ +
inputs | +Input layer |
+
---|---|
outputs | +Output layer |
+
Other model functions: compile
,
+ evaluate_generator
, evaluate
,
+ fit_generator
, fit
,
+ get_config
, get_layer
,
+ keras_model_sequential
,
+ pop_layer
,
+ predict.keras.engine.training.Model
,
+ predict_generator
,
+ predict_on_batch
,
+ predict_proba
,
+ summary.keras.engine.training.Model
,
+ train_on_batch
# NOT RUN { +library(keras) + +# input layer +inputs <- layer_input(shape = c(784)) + +# outputs compose input + dense layers +predictions <- inputs %>% + layer_dense(units = 64, activation = 'relu') %>% + layer_dense(units = 64, activation = 'relu') %>% + layer_dense(units = 10, activation = 'softmax') + +# create and compile model +model <- keras_model(inputs = inputs, outputs = predictions) +model %>% compile( + optimizer = 'rmsprop', + loss = 'categorical_crossentropy', + metrics = c('accuracy') +) +# }+
Keras Model composed of a linear stack of layers
+ + +keras_model_sequential(layers = NULL, name = NULL)+ +
layers | +List of layers to add to the model |
+
---|---|
name | +Name of model |
+
The first layer passed to a Sequential model should have a defined input
+shape. What that means is that it should have received an input_shape
or
+batch_input_shape
argument, or for some type of layers (recurrent,
+Dense...) an input_dim
argument.
Other model functions: compile
,
+ evaluate_generator
, evaluate
,
+ fit_generator
, fit
,
+ get_config
, get_layer
,
+ keras_model
, pop_layer
,
+ predict.keras.engine.training.Model
,
+ predict_generator
,
+ predict_on_batch
,
+ predict_proba
,
+ summary.keras.engine.training.Model
,
+ train_on_batch
# NOT RUN { + +library(keras) + +model <- keras_model_sequential() +model %>% + layer_dense(units = 32, input_shape = c(784)) %>% + layer_activation('relu') %>% + layer_dense(units = 10) %>% + layer_activation('softmax') + +model %>% compile( + optimizer = 'rmsprop', + loss = 'categorical_crossentropy', + metrics = c('accuracy') +) +# }+
Apply an activation function to an output.
+ + +layer_activation(object, activation, input_shape = NULL, + batch_input_shape = NULL, batch_size = NULL, dtype = NULL, + name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
activation | +Name of activation function to use. If you don't specify +anything, no activation is applied (ie. "linear" activation: a(x) = x). |
+
input_shape | +Input shape (list of integers, does not include the +samples axis) which is required when using this layer as the first layer in +a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
Other core layers: layer_activity_regularization
,
+ layer_dense
, layer_dropout
,
+ layer_flatten
, layer_input
,
+ layer_lambda
, layer_masking
,
+ layer_permute
,
+ layer_repeat_vector
,
+ layer_reshape
Other activation layers: layer_activation_elu
,
+ layer_activation_leaky_relu
,
+ layer_activation_parametric_relu
,
+ layer_activation_thresholded_relu
It follows: f(x) = alpha * (exp(x) - 1.0)
for x < 0
, f(x) = x
for `x
= 0`.
+ + +layer_activation_elu(object, alpha = 1, input_shape = NULL, + batch_input_shape = NULL, batch_size = NULL, dtype = NULL, + name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
alpha | +Scale for the negative factor. |
+
input_shape | +Input shape (list of integers, does not include the +samples axis) which is required when using this layer as the first layer in +a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
Fast and Accurate Deep Network Learning by Exponential Linear Units(ELUs).
+Other activation layers: layer_activation_leaky_relu
,
+ layer_activation_parametric_relu
,
+ layer_activation_thresholded_relu
,
+ layer_activation
Allows a small gradient when the unit is not active: f(x) = alpha * x
for
+x < 0
, f(x) = x
for x >= 0
.
layer_activation_leaky_relu(object, alpha = 0.3, input_shape = NULL, + batch_input_shape = NULL, batch_size = NULL, dtype = NULL, + name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
alpha | +float >= 0. Negative slope coefficient. |
+
input_shape | +Input shape (list of integers, does not include the +samples axis) which is required when using this layer as the first layer in +a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
Rectifier Nonlinearities Improve Neural Network AcousticModels.
+Other activation layers: layer_activation_elu
,
+ layer_activation_parametric_relu
,
+ layer_activation_thresholded_relu
,
+ layer_activation
It follows: f(x) = alpha * x`` for
x < 0,
f(x) = xfor
x >= 0`, where
+alpha is a learned array with the same shape as x.
layer_activation_parametric_relu(object, alpha_initializer = "zeros", + alpha_regularizer = NULL, alpha_constraint = NULL, shared_axes = NULL, + input_shape = NULL, batch_input_shape = NULL, batch_size = NULL, + dtype = NULL, name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
alpha_initializer | +Initializer function for the weights. |
+
alpha_regularizer | +Regularizer for the weights. |
+
alpha_constraint | +Constraint for the weights. |
+
shared_axes | +The axes along which to share learnable parameters for the +activation function. For example, if the incoming feature maps are from a +2D convolution with output shape (batch, height, width, channels), and you +wish to share parameters across space so that each filter only has one set +of parameters, set shared_axes=c(1, 2). |
+
input_shape | +Input shape (list of integers, does not include the +samples axis) which is required when using this layer as the first layer in +a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
Delving Deep into Rectifiers: Surpassing Human-Level Performance onImageNet Classification.
+Other activation layers: layer_activation_elu
,
+ layer_activation_leaky_relu
,
+ layer_activation_thresholded_relu
,
+ layer_activation
It follows: f(x) = x
for x > theta
, f(x) = 0
otherwise.
layer_activation_thresholded_relu(object, theta = 1, input_shape = NULL, + batch_input_shape = NULL, batch_size = NULL, dtype = NULL, + name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
theta | +float >= 0. Threshold location of activation. |
+
input_shape | +Input shape (list of integers, does not include the +samples axis) which is required when using this layer as the first layer in +a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
Zero-bias autoencoders and the benefits of co-adapting features.
+Other activation layers: layer_activation_elu
,
+ layer_activation_leaky_relu
,
+ layer_activation_parametric_relu
,
+ layer_activation
Layer that applies an update to the cost function based input activity.
+ + +layer_activity_regularization(object, l1 = 0, l2 = 0, input_shape = NULL, + batch_input_shape = NULL, batch_size = NULL, dtype = NULL, + name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
l1 | +L1 regularization factor (positive float). |
+
l2 | +L2 regularization factor (positive float). |
+
input_shape | +Dimensionality of the input (integer) not including the +samples axis. This argument is required when using this layer as the first +layer in a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
Arbitrary. Use the keyword argument input_shape
(list
+of integers, does not include the samples axis) when using this layer as
+the first layer in a model.
Same shape as input.
+ +Other core layers: layer_activation
,
+ layer_dense
, layer_dropout
,
+ layer_flatten
, layer_input
,
+ layer_lambda
, layer_masking
,
+ layer_permute
,
+ layer_repeat_vector
,
+ layer_reshape
It takes as input a list of tensors, all of the same shape, and returns a +single tensor (also of the same shape).
+ + +layer_add(inputs)+ +
inputs | +A list of input tensors (at least 2). |
+
---|
A tensor, the sum of the inputs.
+ +Other merge layers: layer_average
,
+ layer_concatenate
, layer_dot
,
+ layer_maximum
, layer_multiply
Alpha Dropout is a dropout that keeps mean and variance of inputs to their +original values, in order to ensure the self-normalizing property even after +this dropout.
+ + +layer_alpha_dropout(object, rate, noise_shape = NULL, seed = NULL)+ +
object | +Model or layer object |
+
---|---|
rate | +float, drop probability (as with |
+
noise_shape | +Noise shape |
+
seed | +An integer to use as random seed. |
+
Alpha Dropout fits well to Scaled Exponential Linear Units by randomly +setting activations to the negative saturation value.
+ +Arbitrary. Use the keyword argument input_shape
(list
+of integers, does not include the samples axis) when using this layer as
+the first layer in a model.
Same shape as input.
+ +Other noise layers: layer_gaussian_dropout
,
+ layer_gaussian_noise
It takes as input a list of tensors, all of the same shape, and returns a +single tensor (also of the same shape).
+ + +layer_average(inputs)+ +
inputs | +A list of input tensors (at least 2). |
+
---|
A tensor, the average of the inputs.
+ +Other merge layers: layer_add
,
+ layer_concatenate
, layer_dot
,
+ layer_maximum
, layer_multiply
Average pooling for temporal data.
+ + +layer_average_pooling_1d(object, pool_size = 2L, strides = NULL, + padding = "valid", batch_size = NULL, name = NULL, trainable = NULL, + weights = NULL)+ +
object | +Model or layer object |
+
---|---|
pool_size | +Integer, size of the max pooling windows. |
+
strides | +Integer, or NULL. Factor by which to downscale. E.g. 2 will
+halve the input. If NULL, it will default to |
+
padding | +One of |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
3D tensor with shape: (batch_size, steps, features)
.
3D tensor with shape: (batch_size, downsampled_steps, features)
.
Other pooling layers: layer_average_pooling_2d
,
+ layer_average_pooling_3d
,
+ layer_global_average_pooling_1d
,
+ layer_global_average_pooling_2d
,
+ layer_global_average_pooling_3d
,
+ layer_global_max_pooling_1d
,
+ layer_global_max_pooling_2d
,
+ layer_global_max_pooling_3d
,
+ layer_max_pooling_1d
,
+ layer_max_pooling_2d
,
+ layer_max_pooling_3d
Average pooling operation for spatial data.
+ + +layer_average_pooling_2d(object, pool_size = c(2L, 2L), strides = NULL, + padding = "valid", data_format = NULL, batch_size = NULL, name = NULL, + trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
pool_size | +integer or list of 2 integers, factors by which to downscale +(vertical, horizontal). (2, 2) will halve the input in both spatial +dimension. If only one integer is specified, the same window length will be +used for both dimensions. |
+
strides | +Integer, list of 2 integers, or NULL. Strides values. If NULL,
+it will default to |
+
padding | +One of |
+
data_format | +A string, one of |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
If data_format='channels_last'
: 4D tensor with shape: (batch_size, rows, cols, channels)
If data_format='channels_first'
: 4D tensor with shape: (batch_size, channels, rows, cols)
If data_format='channels_last'
: 4D tensor with shape: (batch_size, pooled_rows, pooled_cols, channels)
If data_format='channels_first'
: 4D tensor with shape: (batch_size, channels, pooled_rows, pooled_cols)
Other pooling layers: layer_average_pooling_1d
,
+ layer_average_pooling_3d
,
+ layer_global_average_pooling_1d
,
+ layer_global_average_pooling_2d
,
+ layer_global_average_pooling_3d
,
+ layer_global_max_pooling_1d
,
+ layer_global_max_pooling_2d
,
+ layer_global_max_pooling_3d
,
+ layer_max_pooling_1d
,
+ layer_max_pooling_2d
,
+ layer_max_pooling_3d
Average pooling operation for 3D data (spatial or spatio-temporal).
+ + +layer_average_pooling_3d(object, pool_size = c(2L, 2L, 2L), strides = NULL, + padding = "valid", data_format = NULL, batch_size = NULL, name = NULL, + trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
pool_size | +list of 3 integers, factors by which to downscale (dim1, +dim2, dim3). (2, 2, 2) will halve the size of the 3D input in each +dimension. |
+
strides | +list of 3 integers, or NULL. Strides values. |
+
padding | +One of |
+
data_format | +A string, one of |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
If data_format='channels_last'
: 5D tensor with shape: (batch_size, spatial_dim1, spatial_dim2, spatial_dim3, channels)
If data_format='channels_first'
: 5D tensor with shape: (batch_size, channels, spatial_dim1, spatial_dim2, spatial_dim3)
If data_format='channels_last'
: 5D tensor with shape: (batch_size, pooled_dim1, pooled_dim2, pooled_dim3, channels)
If data_format='channels_first'
: 5D tensor with shape: (batch_size, channels, pooled_dim1, pooled_dim2, pooled_dim3)
Other pooling layers: layer_average_pooling_1d
,
+ layer_average_pooling_2d
,
+ layer_global_average_pooling_1d
,
+ layer_global_average_pooling_2d
,
+ layer_global_average_pooling_3d
,
+ layer_global_max_pooling_1d
,
+ layer_global_max_pooling_2d
,
+ layer_global_max_pooling_3d
,
+ layer_max_pooling_1d
,
+ layer_max_pooling_2d
,
+ layer_max_pooling_3d
Normalize the activations of the previous layer at each batch, i.e. applies a +transformation that maintains the mean activation close to 0 and the +activation standard deviation close to 1.
+ + +layer_batch_normalization(object, axis = -1L, momentum = 0.99, + epsilon = 0.001, center = TRUE, scale = TRUE, + beta_initializer = "zeros", gamma_initializer = "ones", + moving_mean_initializer = "zeros", moving_variance_initializer = "ones", + beta_regularizer = NULL, gamma_regularizer = NULL, + beta_constraint = NULL, gamma_constraint = NULL, input_shape = NULL, + batch_input_shape = NULL, batch_size = NULL, dtype = NULL, + name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
axis | +Integer, the axis that should be normalized (typically the
+features axis). For instance, after a |
+
momentum | +Momentum for the moving average. |
+
epsilon | +Small float added to variance to avoid dividing by zero. |
+
center | +If TRUE, add offset of |
+
scale | +If TRUE, multiply by |
+
beta_initializer | +Initializer for the beta weight. |
+
gamma_initializer | +Initializer for the gamma weight. |
+
moving_mean_initializer | +Initializer for the moving mean. |
+
moving_variance_initializer | +Initializer for the moving variance. |
+
beta_regularizer | +Optional regularizer for the beta weight. |
+
gamma_regularizer | +Optional regularizer for the gamma weight. |
+
beta_constraint | +Optional constraint for the beta weight. |
+
gamma_constraint | +Optional constraint for the gamma weight. |
+
input_shape | +Dimensionality of the input (integer) not including the +samples axis. This argument is required when using this layer as the first +layer in a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
Arbitrary. Use the keyword argument input_shape
(list
+of integers, does not include the samples axis) when using this layer as
+the first layer in a model.
Same shape as input.
+ +It takes as input a list of tensors, all of the same shape expect for the +concatenation axis, and returns a single tensor, the concatenation of all +inputs.
+ + +layer_concatenate(inputs, axis = -1L)+ +
inputs | +A list of input tensors (at least 2). |
+
---|---|
axis | +Concatenation axis. |
+
A tensor, the concatenation of the inputs alongside axis axis
.
Other merge layers: layer_add
,
+ layer_average
, layer_dot
,
+ layer_maximum
, layer_multiply
This layer creates a convolution kernel that is convolved with the layer
+input over a single spatial (or temporal) dimension to produce a tensor of
+outputs. If use_bias
is TRUE, a bias vector is created and added to the
+outputs. Finally, if activation
is not NULL
, it is applied to the outputs
+as well. When using this layer as the first layer in a model, provide an
+input_shape
argument (list of integers or NULL
, e.g. (10, 128)
for
+sequences of 10 vectors of 128-dimensional vectors, or (NULL, 128)
for
+variable-length sequences of 128-dimensional vectors.
layer_conv_1d(object, filters, kernel_size, strides = 1L, padding = "valid", + dilation_rate = 1L, activation = NULL, use_bias = TRUE, + kernel_initializer = "glorot_uniform", bias_initializer = "zeros", + kernel_regularizer = NULL, bias_regularizer = NULL, + activity_regularizer = NULL, kernel_constraint = NULL, + bias_constraint = NULL, input_shape = NULL, batch_input_shape = NULL, + batch_size = NULL, dtype = NULL, name = NULL, trainable = NULL, + weights = NULL)+ +
object | +Model or layer object |
+
---|---|
filters | +Integer, the dimensionality of the output space (i.e. the +number output of filters in the convolution). |
+
kernel_size | +An integer or list of a single integer, specifying the +length of the 1D convolution window. |
+
strides | +An integer or list of a single integer, specifying the stride
+length of the convolution. Specifying any stride value != 1 is incompatible
+with specifying any |
+
padding | +One of |
+
dilation_rate | +an integer or list of a single integer, specifying the
+dilation rate to use for dilated convolution. Currently, specifying any
+ |
+
activation | +Activation function to use. If you don't specify anything,
+no activation is applied (ie. "linear" activation: |
+
use_bias | +Boolean, whether the layer uses a bias vector. |
+
kernel_initializer | +Initializer for the |
+
bias_initializer | +Initializer for the bias vector. |
+
kernel_regularizer | +Regularizer function applied to the |
+
bias_regularizer | +Regularizer function applied to the bias vector. |
+
activity_regularizer | +Regularizer function applied to the output of the +layer (its "activation").. |
+
kernel_constraint | +Constraint function applied to the kernel matrix. |
+
bias_constraint | +Constraint function applied to the bias vector. |
+
input_shape | +Dimensionality of the input (integer) not including the +samples axis. This argument is required when using this layer as the first +layer in a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
3D tensor with shape: (batch_size, steps, input_dim)
3D tensor with shape: (batch_size, new_steps, filters)
steps
value might have changed due to padding or strides.
Other convolutional layers: layer_conv_2d_transpose
,
+ layer_conv_2d
,
+ layer_conv_3d_transpose
,
+ layer_conv_3d
,
+ layer_conv_lstm_2d
,
+ layer_cropping_1d
,
+ layer_cropping_2d
,
+ layer_cropping_3d
,
+ layer_separable_conv_2d
,
+ layer_upsampling_1d
,
+ layer_upsampling_2d
,
+ layer_upsampling_3d
,
+ layer_zero_padding_1d
,
+ layer_zero_padding_2d
,
+ layer_zero_padding_3d
This layer creates a convolution kernel that is convolved with the layer
+input to produce a tensor of outputs. If use_bias
is TRUE, a bias vector is
+created and added to the outputs. Finally, if activation
is not NULL
, it
+is applied to the outputs as well. When using this layer as the first layer
+in a model, provide the keyword argument input_shape
(list of integers,
+does not include the sample axis), e.g. input_shape=c(128, 128, 3)
for
+128x128 RGB pictures in data_format="channels_last"
.
layer_conv_2d(object, filters, kernel_size, strides = c(1L, 1L), + padding = "valid", data_format = NULL, dilation_rate = c(1L, 1L), + activation = NULL, use_bias = TRUE, + kernel_initializer = "glorot_uniform", bias_initializer = "zeros", + kernel_regularizer = NULL, bias_regularizer = NULL, + activity_regularizer = NULL, kernel_constraint = NULL, + bias_constraint = NULL, input_shape = NULL, batch_input_shape = NULL, + batch_size = NULL, dtype = NULL, name = NULL, trainable = NULL, + weights = NULL)+ +
object | +Model or layer object |
+
---|---|
filters | +Integer, the dimensionality of the output space (i.e. the +number output of filters in the convolution). |
+
kernel_size | +An integer or list of 2 integers, specifying the width and +height of the 2D convolution window. Can be a single integer to specify the +same value for all spatial dimensions. |
+
strides | +An integer or list of 2 integers, specifying the strides of
+the convolution along the width and height. Can be a single integer to
+specify the same value for all spatial dimensions. Specifying any stride
+value != 1 is incompatible with specifying any |
+
padding | +one of |
+
data_format | +A string, one of |
+
dilation_rate | +an integer or list of 2 integers, specifying the
+dilation rate to use for dilated convolution. Can be a single integer to
+specify the same value for all spatial dimensions. Currently, specifying
+any |
+
activation | +Activation function to use. If you don't specify anything,
+no activation is applied (ie. "linear" activation: |
+
use_bias | +Boolean, whether the layer uses a bias vector. |
+
kernel_initializer | +Initializer for the |
+
bias_initializer | +Initializer for the bias vector. |
+
kernel_regularizer | +Regularizer function applied to the |
+
bias_regularizer | +Regularizer function applied to the bias vector. |
+
activity_regularizer | +Regularizer function applied to the output of the +layer (its "activation").. |
+
kernel_constraint | +Constraint function applied to the kernel matrix. |
+
bias_constraint | +Constraint function applied to the bias vector. |
+
input_shape | +Dimensionality of the input (integer) not including the +samples axis. This argument is required when using this layer as the first +layer in a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
4D tensor with shape: (samples, channels, rows, cols)
+if data_format='channels_first' or 4D tensor with shape: (samples, rows, cols, channels)
if data_format='channels_last'.
4D tensor with shape: (samples, filters, new_rows, new_cols)
if data_format='channels_first' or 4D tensor with shape:
+(samples, new_rows, new_cols, filters)
if data_format='channels_last'.
+rows
and cols
values might have changed due to padding.
Other convolutional layers: layer_conv_1d
,
+ layer_conv_2d_transpose
,
+ layer_conv_3d_transpose
,
+ layer_conv_3d
,
+ layer_conv_lstm_2d
,
+ layer_cropping_1d
,
+ layer_cropping_2d
,
+ layer_cropping_3d
,
+ layer_separable_conv_2d
,
+ layer_upsampling_1d
,
+ layer_upsampling_2d
,
+ layer_upsampling_3d
,
+ layer_zero_padding_1d
,
+ layer_zero_padding_2d
,
+ layer_zero_padding_3d
The need for transposed convolutions generally arises from the desire to use
+a transformation going in the opposite direction of a normal convolution,
+i.e., from something that has the shape of the output of some convolution to
+something that has the shape of its input while maintaining a connectivity
+pattern that is compatible with said convolution. When using this layer as
+the first layer in a model, provide the keyword argument input_shape
(list
+of integers, does not include the sample axis), e.g. input_shape=c(128L, 128L, 3L)
for 128x128 RGB pictures in data_format="channels_last"
.
layer_conv_2d_transpose(object, filters, kernel_size, strides = c(1L, 1L), + padding = "valid", data_format = NULL, activation = NULL, + use_bias = TRUE, kernel_initializer = "glorot_uniform", + bias_initializer = "zeros", kernel_regularizer = NULL, + bias_regularizer = NULL, activity_regularizer = NULL, + kernel_constraint = NULL, bias_constraint = NULL, input_shape = NULL, + batch_input_shape = NULL, batch_size = NULL, dtype = NULL, + name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
filters | +Integer, the dimensionality of the output space (i.e. the +number of output filters in the convolution). |
+
kernel_size | +An integer or list of 2 integers, specifying the width and +height of the 2D convolution window. Can be a single integer to specify the +same value for all spatial dimensions. |
+
strides | +An integer or list of 2 integers, specifying the strides of
+the convolution along the width and height. Can be a single integer to
+specify the same value for all spatial dimensions. Specifying any stride
+value != 1 is incompatible with specifying any |
+
padding | +one of |
+
data_format | +A string, one of |
+
activation | +Activation function to use. If you don't specify anything,
+no activation is applied (ie. "linear" activation: |
+
use_bias | +Boolean, whether the layer uses a bias vector. |
+
kernel_initializer | +Initializer for the |
+
bias_initializer | +Initializer for the bias vector. |
+
kernel_regularizer | +Regularizer function applied to the |
+
bias_regularizer | +Regularizer function applied to the bias vector. |
+
activity_regularizer | +Regularizer function applied to the output of the +layer (its "activation").. |
+
kernel_constraint | +Constraint function applied to the kernel matrix. |
+
bias_constraint | +Constraint function applied to the bias vector. |
+
input_shape | +Dimensionality of the input (integer) not including the +samples axis. This argument is required when using this layer as the first +layer in a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
4D tensor with shape: (batch, channels, rows, cols)
+if data_format='channels_first' or 4D tensor with shape: (batch, rows, cols, channels)
if data_format='channels_last'.
4D tensor with shape: (batch, filters, new_rows, new_cols)
if data_format='channels_first' or 4D tensor with shape:
+(batch, new_rows, new_cols, filters)
if data_format='channels_last'.
+rows
and cols
values might have changed due to padding.
Other convolutional layers: layer_conv_1d
,
+ layer_conv_2d
,
+ layer_conv_3d_transpose
,
+ layer_conv_3d
,
+ layer_conv_lstm_2d
,
+ layer_cropping_1d
,
+ layer_cropping_2d
,
+ layer_cropping_3d
,
+ layer_separable_conv_2d
,
+ layer_upsampling_1d
,
+ layer_upsampling_2d
,
+ layer_upsampling_3d
,
+ layer_zero_padding_1d
,
+ layer_zero_padding_2d
,
+ layer_zero_padding_3d
This layer creates a convolution kernel that is convolved with the layer
+input to produce a tensor of outputs. If use_bias
is TRUE, a bias vector is
+created and added to the outputs. Finally, if activation
is not NULL
, it
+is applied to the outputs as well. When using this layer as the first layer
+in a model, provide the keyword argument input_shape
(list of integers,
+does not include the sample axis), e.g. input_shape=c(128L, 128L, 128L, 3L)
+for 128x128x128 volumes with a single channel, in
+data_format="channels_last"
.
layer_conv_3d(object, filters, kernel_size, strides = c(1L, 1L, 1L), + padding = "valid", data_format = NULL, dilation_rate = c(1L, 1L, 1L), + activation = NULL, use_bias = TRUE, + kernel_initializer = "glorot_uniform", bias_initializer = "zeros", + kernel_regularizer = NULL, bias_regularizer = NULL, + activity_regularizer = NULL, kernel_constraint = NULL, + bias_constraint = NULL, input_shape = NULL, batch_input_shape = NULL, + batch_size = NULL, dtype = NULL, name = NULL, trainable = NULL, + weights = NULL)+ +
object | +Model or layer object |
+
---|---|
filters | +Integer, the dimensionality of the output space (i.e. the +number output of filters in the convolution). |
+
kernel_size | +An integer or list of 3 integers, specifying the depth, +height, and width of the 3D convolution window. Can be a single integer +to specify the same value for all spatial dimensions. |
+
strides | +An integer or list of 3 integers, specifying the strides of
+the convolution along each spatial dimension. Can be a single integer to
+specify the same value for all spatial dimensions. Specifying any stride
+value != 1 is incompatible with specifying any |
+
padding | +one of |
+
data_format | +A string, one of |
+
dilation_rate | +an integer or list of 3 integers, specifying the
+dilation rate to use for dilated convolution. Can be a single integer to
+specify the same value for all spatial dimensions. Currently, specifying
+any |
+
activation | +Activation function to use. If you don't specify anything,
+no activation is applied (ie. "linear" activation: |
+
use_bias | +Boolean, whether the layer uses a bias vector. |
+
kernel_initializer | +Initializer for the |
+
bias_initializer | +Initializer for the bias vector. |
+
kernel_regularizer | +Regularizer function applied to the |
+
bias_regularizer | +Regularizer function applied to the bias vector. |
+
activity_regularizer | +Regularizer function applied to the output of the +layer (its "activation").. |
+
kernel_constraint | +Constraint function applied to the kernel matrix. |
+
bias_constraint | +Constraint function applied to the bias vector. |
+
input_shape | +Dimensionality of the input (integer) not including the +samples axis. This argument is required when using this layer as the first +layer in a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
5D tensor with shape: (samples, channels, conv_dim1, conv_dim2, conv_dim3)
if data_format='channels_first' or 5D tensor with
+shape: (samples, conv_dim1, conv_dim2, conv_dim3, channels)
if
+data_format='channels_last'.
5D tensor with shape: (samples, filters, new_conv_dim1, new_conv_dim2, new_conv_dim3)
if
+data_format='channels_first' or 5D tensor with shape: (samples, new_conv_dim1, new_conv_dim2, new_conv_dim3, filters)
if
+data_format='channels_last'. new_conv_dim1
, new_conv_dim2
and
+new_conv_dim3
values might have changed due to padding.
Other convolutional layers: layer_conv_1d
,
+ layer_conv_2d_transpose
,
+ layer_conv_2d
,
+ layer_conv_3d_transpose
,
+ layer_conv_lstm_2d
,
+ layer_cropping_1d
,
+ layer_cropping_2d
,
+ layer_cropping_3d
,
+ layer_separable_conv_2d
,
+ layer_upsampling_1d
,
+ layer_upsampling_2d
,
+ layer_upsampling_3d
,
+ layer_zero_padding_1d
,
+ layer_zero_padding_2d
,
+ layer_zero_padding_3d
The need for transposed convolutions generally arises from the desire to use +a transformation going in the opposite direction of a normal convolution, +i.e., from something that has the shape of the output of some convolution to +something that has the shape of its input while maintaining a connectivity +pattern that is compatible with said convolution.
+ + +layer_conv_3d_transpose(object, filters, kernel_size, strides = c(1, 1, 1), + padding = "valid", data_format = NULL, activation = NULL, + use_bias = TRUE, kernel_initializer = "glorot_uniform", + bias_initializer = "zeros", kernel_regularizer = NULL, + bias_regularizer = NULL, activity_regularizer = NULL, + kernel_constraint = NULL, bias_constraint = NULL, input_shape = NULL, + batch_input_shape = NULL, batch_size = NULL, dtype = NULL, + name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
filters | +Integer, the dimensionality of the output space (i.e. the +number of output filters in the convolution). |
+
kernel_size | +An integer or list of 3 integers, specifying the depth, +height, and width of the 3D convolution window. Can be a single integer +to specify the same value for all spatial dimensions. |
+
strides | +An integer or list of 3 integers, specifying the strides of
+the convolution along the depth, height and width.. Can be a single integer
+to specify the same value for all spatial dimensions. Specifying any stride
+value != 1 is incompatible with specifying any |
+
padding | +one of |
+
data_format | +A string, one of |
+
activation | +Activation function to use. If you don't specify anything, no
+activation is applied (ie. "linear" activation: |
+
use_bias | +Boolean, whether the layer uses a bias vector. |
+
kernel_initializer | +Initializer for the |
+
bias_initializer | +Initializer for the bias vector. |
+
kernel_regularizer | +Regularizer function applied to the |
+
bias_regularizer | +Regularizer function applied to the bias vector. |
+
activity_regularizer | +Regularizer function applied to the output of the +layer (its "activation"). |
+
kernel_constraint | +Constraint function applied to the kernel matrix. |
+
bias_constraint | +Constraint function applied to the bias vector. |
+
input_shape | +Dimensionality of the input (integer) not including the +samples axis. This argument is required when using this layer as the first +layer in a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
When using this layer as the first layer in a model, provide the keyword argument
+input_shape
(list of integers, does not include the sample axis), e.g.
+input_shape = list(128, 128, 128, 3)
for a 128x128x128 volume with 3 channels if
+data_format="channels_last"
.
Other convolutional layers: layer_conv_1d
,
+ layer_conv_2d_transpose
,
+ layer_conv_2d
, layer_conv_3d
,
+ layer_conv_lstm_2d
,
+ layer_cropping_1d
,
+ layer_cropping_2d
,
+ layer_cropping_3d
,
+ layer_separable_conv_2d
,
+ layer_upsampling_1d
,
+ layer_upsampling_2d
,
+ layer_upsampling_3d
,
+ layer_zero_padding_1d
,
+ layer_zero_padding_2d
,
+ layer_zero_padding_3d
It is similar to an LSTM layer, but the input transformations and recurrent +transformations are both convolutional.
+ + +layer_conv_lstm_2d(object, filters, kernel_size, strides = c(1L, 1L), + padding = "valid", data_format = NULL, dilation_rate = c(1L, 1L), + activation = "tanh", recurrent_activation = "hard_sigmoid", + use_bias = TRUE, kernel_initializer = "glorot_uniform", + recurrent_initializer = "orthogonal", bias_initializer = "zeros", + unit_forget_bias = TRUE, kernel_regularizer = NULL, + recurrent_regularizer = NULL, bias_regularizer = NULL, + activity_regularizer = NULL, kernel_constraint = NULL, + recurrent_constraint = NULL, bias_constraint = NULL, + return_sequences = FALSE, go_backwards = FALSE, stateful = FALSE, + dropout = 0, recurrent_dropout = 0, batch_size = NULL, name = NULL, + trainable = NULL, weights = NULL, input_shape = NULL)+ +
object | +Model or layer object |
+
---|---|
filters | +Integer, the dimensionality of the output space (i.e. the +number output of filters in the convolution). |
+
kernel_size | +An integer or list of n integers, specifying the +dimensions of the convolution window. |
+
strides | +An integer or list of n integers, specifying the strides of
+the convolution. Specifying any stride value != 1 is incompatible with
+specifying any |
+
padding | +One of |
+
data_format | +A string, one of |
+
dilation_rate | +An integer or list of n integers, specifying the
+dilation rate to use for dilated convolution. Currently, specifying any
+ |
+
activation | +Activation function to use. If you don't specify anything,
+no activation is applied (ie. "linear" activation: |
+
recurrent_activation | +Activation function to use for the recurrent +step. |
+
use_bias | +Boolean, whether the layer uses a bias vector. |
+
kernel_initializer | +Initializer for the |
+
recurrent_initializer | +Initializer for the |
+
bias_initializer | +Initializer for the bias vector. |
+
unit_forget_bias | +Boolean. If TRUE, add 1 to the bias of the forget
+gate at initialization. Use in combination with |
+
kernel_regularizer | +Regularizer function applied to the |
+
recurrent_regularizer | +Regularizer function applied to the
+ |
+
bias_regularizer | +Regularizer function applied to the bias vector. |
+
activity_regularizer | +Regularizer function applied to the output of the +layer (its "activation").. |
+
kernel_constraint | +Constraint function applied to the |
+
recurrent_constraint | +Constraint function applied to the
+ |
+
bias_constraint | +Constraint function applied to the bias vector. |
+
return_sequences | +Boolean. Whether to return the last output in the +output sequence, or the full sequence. |
+
go_backwards | +Boolean (default FALSE). If TRUE, rocess the input +sequence backwards. |
+
stateful | +Boolean (default FALSE). If TRUE, the last state for each +sample at index i in a batch will be used as initial state for the sample +of index i in the following batch. |
+
dropout | +Float between 0 and 1. Fraction of the units to drop for the +linear transformation of the inputs. |
+
recurrent_dropout | +Float between 0 and 1. Fraction of the units to drop +for the linear transformation of the recurrent state. |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
input_shape | +Dimensionality of the input (integer) not including the +samples axis. This argument is required when using this layer as the first +layer in a model. |
+
if data_format='channels_first' 5D tensor with shape:
+(samples,time, channels, rows, cols)
if data_format='channels_last' 5D
+tensor with shape: (samples,time, rows, cols, channels)
Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting +The current implementation does not include the feedback loop on the cells +output
Other convolutional layers: layer_conv_1d
,
+ layer_conv_2d_transpose
,
+ layer_conv_2d
,
+ layer_conv_3d_transpose
,
+ layer_conv_3d
,
+ layer_cropping_1d
,
+ layer_cropping_2d
,
+ layer_cropping_3d
,
+ layer_separable_conv_2d
,
+ layer_upsampling_1d
,
+ layer_upsampling_2d
,
+ layer_upsampling_3d
,
+ layer_zero_padding_1d
,
+ layer_zero_padding_2d
,
+ layer_zero_padding_3d
It crops along the time dimension (axis 1).
+ + +layer_cropping_1d(object, cropping = c(1L, 1L), batch_size = NULL, + name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
cropping | +int or list of int (length 2) How many units should be +trimmed off at the beginning and end of the cropping dimension (axis 1). If +a single int is provided, the same value will be used for both. |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
3D tensor with shape (batch, axis_to_crop, features)
3D tensor with shape (batch, cropped_axis, features)
Other convolutional layers: layer_conv_1d
,
+ layer_conv_2d_transpose
,
+ layer_conv_2d
,
+ layer_conv_3d_transpose
,
+ layer_conv_3d
,
+ layer_conv_lstm_2d
,
+ layer_cropping_2d
,
+ layer_cropping_3d
,
+ layer_separable_conv_2d
,
+ layer_upsampling_1d
,
+ layer_upsampling_2d
,
+ layer_upsampling_3d
,
+ layer_zero_padding_1d
,
+ layer_zero_padding_2d
,
+ layer_zero_padding_3d
It crops along spatial dimensions, i.e. width and height.
+ + +layer_cropping_2d(object, cropping = list(c(0L, 0L), c(0L, 0L)), + data_format = NULL, batch_size = NULL, name = NULL, trainable = NULL, + weights = NULL)+ +
object | +Model or layer object |
+
---|---|
cropping | +int, or list of 2 ints, or list of 2 lists of 2 ints.
|
+
data_format | +A string, one of |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
4D tensor with shape:
If data_format
is "channels_last"
: (batch, rows, cols, channels)
If data_format
is "channels_first"
: (batch, channels, rows, cols)
4D tensor with shape:
If data_format
is "channels_last"
: (batch, cropped_rows, cropped_cols, channels)
If data_format
is "channels_first"
: (batch, channels, cropped_rows, cropped_cols)
Other convolutional layers: layer_conv_1d
,
+ layer_conv_2d_transpose
,
+ layer_conv_2d
,
+ layer_conv_3d_transpose
,
+ layer_conv_3d
,
+ layer_conv_lstm_2d
,
+ layer_cropping_1d
,
+ layer_cropping_3d
,
+ layer_separable_conv_2d
,
+ layer_upsampling_1d
,
+ layer_upsampling_2d
,
+ layer_upsampling_3d
,
+ layer_zero_padding_1d
,
+ layer_zero_padding_2d
,
+ layer_zero_padding_3d
Cropping layer for 3D data (e.g. spatial or spatio-temporal).
+ + +layer_cropping_3d(object, cropping = list(c(1L, 1L), c(1L, 1L), c(1L, 1L)), + data_format = NULL, batch_size = NULL, name = NULL, trainable = NULL, + weights = NULL)+ +
object | +Model or layer object |
+
---|---|
cropping | +int, or list of 3 ints, or list of 3 lists of 2 ints.
|
+
data_format | +A string, one of |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
5D tensor with shape:
If data_format
is "channels_last"
: (batch, first_axis_to_crop, second_axis_to_crop, third_axis_to_crop, depth)
If data_format
is "channels_first"
:
+(batch, depth, first_axis_to_crop, second_axis_to_crop, third_axis_to_crop)
5D tensor with shape:
If data_format
is "channels_last"
: (batch, first_cropped_axis, second_cropped_axis, third_cropped_axis, depth)
If data_format
is "channels_first"
: (batch, depth, first_cropped_axis, second_cropped_axis, third_cropped_axis)
Other convolutional layers: layer_conv_1d
,
+ layer_conv_2d_transpose
,
+ layer_conv_2d
,
+ layer_conv_3d_transpose
,
+ layer_conv_3d
,
+ layer_conv_lstm_2d
,
+ layer_cropping_1d
,
+ layer_cropping_2d
,
+ layer_separable_conv_2d
,
+ layer_upsampling_1d
,
+ layer_upsampling_2d
,
+ layer_upsampling_3d
,
+ layer_zero_padding_1d
,
+ layer_zero_padding_2d
,
+ layer_zero_padding_3d
Implements the operation: output = activation(dot(input, kernel) + bias)
+where activation
is the element-wise activation function passed as the
+activation
argument, kernel
is a weights matrix created by the layer, and
+bias
is a bias vector created by the layer (only applicable if use_bias
+is TRUE
). Note: if the input to the layer has a rank greater than 2, then
+it is flattened prior to the initial dot product with kernel
.
layer_dense(object, units, activation = NULL, use_bias = TRUE, + kernel_initializer = "glorot_uniform", bias_initializer = "zeros", + kernel_regularizer = NULL, bias_regularizer = NULL, + activity_regularizer = NULL, kernel_constraint = NULL, + bias_constraint = NULL, input_shape = NULL, batch_input_shape = NULL, + batch_size = NULL, dtype = NULL, name = NULL, trainable = NULL, + weights = NULL)+ +
object | +Model or layer object |
+
---|---|
units | +Positive integer, dimensionality of the output space. |
+
activation | +Name of activation function to use. If you don't specify +anything, no activation is applied (ie. "linear" activation: a(x) = x). |
+
use_bias | +Whether the layer uses a bias vector. |
+
kernel_initializer | +Initializer for the |
+
bias_initializer | +Initializer for the bias vector. |
+
kernel_regularizer | +Regularizer function applied to the |
+
bias_regularizer | +Regularizer function applied to the bias vector. |
+
activity_regularizer | +Regularizer function applied to the output of the +layer (its "activation").. |
+
kernel_constraint | +Constraint function applied to the |
+
bias_constraint | +Constraint function applied to the bias vector. |
+
input_shape | +Dimensionality of the input (integer) not including the +samples axis. This argument is required when using this layer as the first +layer in a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
Input shape: nD tensor with shape: (batch_size, ..., input_dim)
. The most
+common situation would be a 2D input with shape (batch_size, input_dim)
.
Output shape: nD tensor with shape: (batch_size, ..., units)
. For
+instance, for a 2D input with shape (batch_size, input_dim)
, the output
+would have shape (batch_size, unit)
.
Other core layers: layer_activation
,
+ layer_activity_regularization
,
+ layer_dropout
, layer_flatten
,
+ layer_input
, layer_lambda
,
+ layer_masking
, layer_permute
,
+ layer_repeat_vector
,
+ layer_reshape
Layer that computes a dot product between samples in two tensors.
+ + +layer_dot(inputs, axes, normalize = FALSE)+ +
inputs | +A list of input tensors (at least 2). |
+
---|---|
axes | +Integer or list of integers, axis or axes along which to take the dot product. |
+
normalize | +Whether to L2-normalize samples along the dot product axis before taking the dot product. If set to TRUE, then the output of the dot product is the cosine proximity between the two samples. **kwargs: Standard layer keyword arguments. |
+
A tensor, the dot product of the samples from the inputs.
+ +Other merge layers: layer_add
,
+ layer_average
,
+ layer_concatenate
,
+ layer_maximum
, layer_multiply
Dropout consists in randomly setting a fraction rate
of input units to 0 at
+each update during training time, which helps prevent overfitting.
layer_dropout(object, rate, noise_shape = NULL, seed = NULL, + batch_size = NULL, name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
rate | +float between 0 and 1. Fraction of the input units to drop. |
+
noise_shape | +1D integer tensor representing the shape of the binary
+dropout mask that will be multiplied with the input. For instance, if your
+inputs have shape |
+
seed | +A Python integer to use as random seed. |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
Other core layers: layer_activation
,
+ layer_activity_regularization
,
+ layer_dense
, layer_flatten
,
+ layer_input
, layer_lambda
,
+ layer_masking
, layer_permute
,
+ layer_repeat_vector
,
+ layer_reshape
Other dropout layers: layer_spatial_dropout_1d
,
+ layer_spatial_dropout_2d
,
+ layer_spatial_dropout_3d
For example, list(4L, 20L) -> list(c(0.25, 0.1), c(0.6, -0.2))
This layer
+can only be used as the first layer in a model.
layer_embedding(object, input_dim, output_dim, + embeddings_initializer = "uniform", embeddings_regularizer = NULL, + activity_regularizer = NULL, embeddings_constraint = NULL, + mask_zero = FALSE, input_length = NULL, batch_size = NULL, + name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
input_dim | +int > 0. Size of the vocabulary, i.e. maximum integer +index + 1. |
+
output_dim | +int >= 0. Dimension of the dense embedding. |
+
embeddings_initializer | +Initializer for the |
+
embeddings_regularizer | +Regularizer function applied to the
+ |
+
activity_regularizer | +activity_regularizer |
+
embeddings_constraint | +Constraint function applied to the |
+
mask_zero | +Whether or not the input value 0 is a special "padding"
+value that should be masked out. This is useful when using recurrent
+layers, which may take variable length inputs. If this is |
+
input_length | +Length of input sequences, when it is constant. This
+argument is required if you are going to connect |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
2D tensor with shape: (batch_size, sequence_length)
.
3D tensor with shape: (batch_size, sequence_length, output_dim)
.
Flatten a given input, does not affect the batch size.
+ + +layer_flatten(object, batch_size = NULL, name = NULL, trainable = NULL, + weights = NULL)+ +
object | +Model or layer object |
+
---|---|
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
Other core layers: layer_activation
,
+ layer_activity_regularization
,
+ layer_dense
, layer_dropout
,
+ layer_input
, layer_lambda
,
+ layer_masking
, layer_permute
,
+ layer_repeat_vector
,
+ layer_reshape
As it is a regularization layer, it is only active at training time.
+ + +layer_gaussian_dropout(object, rate, input_shape = NULL, + batch_input_shape = NULL, batch_size = NULL, dtype = NULL, + name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
rate | +float, drop probability (as with |
+
input_shape | +Dimensionality of the input (integer) not including the +samples axis. This argument is required when using this layer as the first +layer in a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
Arbitrary. Use the keyword argument input_shape
(list
+of integers, does not include the samples axis) when using this layer as
+the first layer in a model.
Same shape as input.
+ +Dropout: A Simple Way to Prevent Neural Networks from Overfitting Srivastava, Hinton, et al. 2014
Other noise layers: layer_alpha_dropout
,
+ layer_gaussian_noise
This is useful to mitigate overfitting (you could see it as a form of random +data augmentation). Gaussian Noise (GS) is a natural choice as corruption +process for real valued inputs. As it is a regularization layer, it is only +active at training time.
+ + +layer_gaussian_noise(object, stddev, input_shape = NULL, + batch_input_shape = NULL, batch_size = NULL, dtype = NULL, + name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
stddev | +float, standard deviation of the noise distribution. |
+
input_shape | +Dimensionality of the input (integer) not including the +samples axis. This argument is required when using this layer as the first +layer in a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
Arbitrary. Use the keyword argument input_shape
(list
+of integers, does not include the samples axis) when using this layer as
+the first layer in a model.
Same shape as input.
+ +Other noise layers: layer_alpha_dropout
,
+ layer_gaussian_dropout
Global average pooling operation for temporal data.
+ + +layer_global_average_pooling_1d(object, batch_size = NULL, name = NULL, + trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
3D tensor with shape: (batch_size, steps, features)
.
2D tensor with shape: (batch_size, channels)
Other pooling layers: layer_average_pooling_1d
,
+ layer_average_pooling_2d
,
+ layer_average_pooling_3d
,
+ layer_global_average_pooling_2d
,
+ layer_global_average_pooling_3d
,
+ layer_global_max_pooling_1d
,
+ layer_global_max_pooling_2d
,
+ layer_global_max_pooling_3d
,
+ layer_max_pooling_1d
,
+ layer_max_pooling_2d
,
+ layer_max_pooling_3d
Global average pooling operation for spatial data.
+ + +layer_global_average_pooling_2d(object, data_format = NULL, + batch_size = NULL, name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
data_format | +A string, one of |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
If data_format='channels_last'
: 4D tensor with shape: (batch_size, rows, cols, channels)
If data_format='channels_first'
: 4D tensor with shape: (batch_size, channels, rows, cols)
2D tensor with shape: (batch_size, channels)
Other pooling layers: layer_average_pooling_1d
,
+ layer_average_pooling_2d
,
+ layer_average_pooling_3d
,
+ layer_global_average_pooling_1d
,
+ layer_global_average_pooling_3d
,
+ layer_global_max_pooling_1d
,
+ layer_global_max_pooling_2d
,
+ layer_global_max_pooling_3d
,
+ layer_max_pooling_1d
,
+ layer_max_pooling_2d
,
+ layer_max_pooling_3d
Global Average pooling operation for 3D data.
+ + +layer_global_average_pooling_3d(object, data_format = NULL, + batch_size = NULL, name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
data_format | +A string, one of |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
If data_format='channels_last'
: 5D tensor with shape: (batch_size, spatial_dim1, spatial_dim2, spatial_dim3, channels)
If data_format='channels_first'
: 5D tensor with shape: (batch_size, channels, spatial_dim1, spatial_dim2, spatial_dim3)
2D tensor with shape: (batch_size, channels)
Other pooling layers: layer_average_pooling_1d
,
+ layer_average_pooling_2d
,
+ layer_average_pooling_3d
,
+ layer_global_average_pooling_1d
,
+ layer_global_average_pooling_2d
,
+ layer_global_max_pooling_1d
,
+ layer_global_max_pooling_2d
,
+ layer_global_max_pooling_3d
,
+ layer_max_pooling_1d
,
+ layer_max_pooling_2d
,
+ layer_max_pooling_3d
Global max pooling operation for temporal data.
+ + +layer_global_max_pooling_1d(object, batch_size = NULL, name = NULL, + trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
3D tensor with shape: (batch_size, steps, features)
.
2D tensor with shape: (batch_size, channels)
Other pooling layers: layer_average_pooling_1d
,
+ layer_average_pooling_2d
,
+ layer_average_pooling_3d
,
+ layer_global_average_pooling_1d
,
+ layer_global_average_pooling_2d
,
+ layer_global_average_pooling_3d
,
+ layer_global_max_pooling_2d
,
+ layer_global_max_pooling_3d
,
+ layer_max_pooling_1d
,
+ layer_max_pooling_2d
,
+ layer_max_pooling_3d
Global max pooling operation for spatial data.
+ + +layer_global_max_pooling_2d(object, data_format = NULL, batch_size = NULL, + name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
data_format | +A string, one of |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
If data_format='channels_last'
: 4D tensor with shape: (batch_size, rows, cols, channels)
If data_format='channels_first'
: 4D tensor with shape: (batch_size, channels, rows, cols)
2D tensor with shape: (batch_size, channels)
Other pooling layers: layer_average_pooling_1d
,
+ layer_average_pooling_2d
,
+ layer_average_pooling_3d
,
+ layer_global_average_pooling_1d
,
+ layer_global_average_pooling_2d
,
+ layer_global_average_pooling_3d
,
+ layer_global_max_pooling_1d
,
+ layer_global_max_pooling_3d
,
+ layer_max_pooling_1d
,
+ layer_max_pooling_2d
,
+ layer_max_pooling_3d
Global Max pooling operation for 3D data.
+ + +layer_global_max_pooling_3d(object, data_format = NULL, batch_size = NULL, + name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
data_format | +A string, one of |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
If data_format='channels_last'
: 5D tensor with shape: (batch_size, spatial_dim1, spatial_dim2, spatial_dim3, channels)
If data_format='channels_first'
: 5D tensor with shape: (batch_size, channels, spatial_dim1, spatial_dim2, spatial_dim3)
2D tensor with shape: (batch_size, channels)
Other pooling layers: layer_average_pooling_1d
,
+ layer_average_pooling_2d
,
+ layer_average_pooling_3d
,
+ layer_global_average_pooling_1d
,
+ layer_global_average_pooling_2d
,
+ layer_global_average_pooling_3d
,
+ layer_global_max_pooling_1d
,
+ layer_global_max_pooling_2d
,
+ layer_max_pooling_1d
,
+ layer_max_pooling_2d
,
+ layer_max_pooling_3d
Gated Recurrent Unit - Cho et al.
+ + +layer_gru(object, units, activation = "tanh", + recurrent_activation = "hard_sigmoid", use_bias = TRUE, + return_sequences = FALSE, return_state = FALSE, go_backwards = FALSE, + stateful = FALSE, unroll = FALSE, implementation = 0L, + kernel_initializer = "glorot_uniform", + recurrent_initializer = "orthogonal", bias_initializer = "zeros", + kernel_regularizer = NULL, recurrent_regularizer = NULL, + bias_regularizer = NULL, activity_regularizer = NULL, + kernel_constraint = NULL, recurrent_constraint = NULL, + bias_constraint = NULL, dropout = 0, recurrent_dropout = 0, + input_shape = NULL, batch_input_shape = NULL, batch_size = NULL, + dtype = NULL, name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
units | +Positive integer, dimensionality of the output space. |
+
activation | +Activation function to use. If you pass |
+
recurrent_activation | +Activation function to use for the recurrent +step. |
+
use_bias | +Boolean, whether the layer uses a bias vector. |
+
return_sequences | +Boolean. Whether to return the last output in the +output sequence, or the full sequence. |
+
return_state | +Boolean (default FALSE). Whether to return the last state +in addition to the output. |
+
go_backwards | +Boolean (default FALSE). If TRUE, process the input +sequence backwards and return the reversed sequence. |
+
stateful | +Boolean (default FALSE). If TRUE, the last state for each +sample at index i in a batch will be used as initial state for the sample +of index i in the following batch. |
+
unroll | +Boolean (default FALSE). If TRUE, the network will be unrolled, +else a symbolic loop will be used. Unrolling can speed-up a RNN, although +it tends to be more memory-intensive. Unrolling is only suitable for short +sequences. |
+
implementation | +one of 0, 1, or 2. If set to 0, the RNN will use an +implementation that uses fewer, larger matrix products, thus running faster +on CPU but consuming more memory. If set to 1, the RNN will use more matrix +products, but smaller ones, thus running slower (may actually be faster on +GPU) while consuming less memory. If set to 2 (LSTM/GRU only), the RNN will +combine the input gate, the forget gate and the output gate into a single +matrix, enabling more time-efficient parallelization on the GPU. |
+
kernel_initializer | +Initializer for the |
+
recurrent_initializer | +Initializer for the |
+
bias_initializer | +Initializer for the bias vector. |
+
kernel_regularizer | +Regularizer function applied to the |
+
recurrent_regularizer | +Regularizer function applied to the
+ |
+
bias_regularizer | +Regularizer function applied to the bias vector. |
+
activity_regularizer | +Regularizer function applied to the output of the +layer (its "activation").. |
+
kernel_constraint | +Constraint function applied to the |
+
recurrent_constraint | +Constraint function applied to the
+ |
+
bias_constraint | +Constraint function applied to the bias vector. |
+
dropout | +Float between 0 and 1. Fraction of the units to drop for the +linear transformation of the inputs. |
+
recurrent_dropout | +Float between 0 and 1. Fraction of the units to drop +for the linear transformation of the recurrent state. |
+
input_shape | +Dimensionality of the input (integer) not including the +samples axis. This argument is required when using this layer as the first +layer in a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
3D tensor with shape (batch_size, timesteps, input_dim)
,
+(Optional) 2D tensors with shape (batch_size, output_dim)
.
if return_state
: a list of tensors. The first tensor is
+the output. The remaining tensors are the last states,
+each with shape (batch_size, units)
.
if return_sequences
: 3D tensor with shape
+(batch_size, timesteps, units)
.
else, 2D tensor with shape (batch_size, units)
.
This layer supports masking for input data with a variable number
+of timesteps. To introduce masks to your data,
+use an embedding layer with the mask_zero
parameter
+set to TRUE
.
You can set RNN layers to be 'stateful', which means that the states +computed for the samples in one batch will be reused as initial states +for the samples in the next batch. This assumes a one-to-one mapping +between samples in different successive batches.
+To enable statefulness:
Specify stateful=TRUE
in the layer constructor.
Specify a fixed batch size for your model. For sequential models,
+pass batch_input_shape = c(...)
to the first layer in your model.
+For functional models with 1 or more Input layers, pass
+batch_shape = c(...)
to all the first layers in your model.
+This is the expected shape of your inputs including the batch size.
+It should be a vector of integers, e.g. c(32, 10, 100)
.
Specify shuffle = FALSE
when calling fit().
To reset the states of your model, call reset_states()
on either
+a specific layer, or on your entire model.
You can specify the initial state of RNN layers symbolically by calling
+them with the keyword argument initial_state
. The value of
+initial_state
should be a tensor or list of tensors representing
+the initial state of the RNN layer.
You can specify the initial state of RNN layers numerically by
+calling reset_states
with the keyword argument states
. The value of
+states
should be a numpy array or list of numpy arrays representing
+the initial state of the RNN layer.
On the Properties of Neural Machine Translation:Encoder-Decoder Approaches
EmpiricalEvaluation of Gated Recurrent Neural Networks on SequenceModeling
A Theoretically GroundedApplication of Dropout in Recurrent NeuralNetworks
Other recurrent layers: layer_lstm
,
+ layer_simple_rnn
Layer to be used as an entry point into a graph.
+ + +layer_input(shape = NULL, batch_shape = NULL, name = NULL, dtype = NULL, + sparse = FALSE, tensor = NULL)+ +
shape | +Shape, not including the batch size. For instance,
+ |
+
---|---|
batch_shape | +Shapes, including the batch size. For instance,
+ |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
dtype | +The data type expected by the input, as a string ( |
+
sparse | +Boolean, whether the placeholder created is meant to be sparse. |
+
tensor | +Existing tensor to wrap into the |
+
A tensor
+ +It can either wrap an existing tensor (pass an input_tensor
+argument) or create its a placeholder tensor (pass arguments input_shape
+or batch_input_shape
as well as input_dtype
).
Other core layers: layer_activation
,
+ layer_activity_regularization
,
+ layer_dense
, layer_dropout
,
+ layer_flatten
, layer_lambda
,
+ layer_masking
, layer_permute
,
+ layer_repeat_vector
,
+ layer_reshape
Wraps arbitrary expression as a layer
+ + +layer_lambda(object, f, output_shape = NULL, mask = NULL, + arguments = NULL, input_shape = NULL, batch_input_shape = NULL, + batch_size = NULL, dtype = NULL, name = NULL, trainable = NULL, + weights = NULL)+ +
object | +Model or layer object |
+
---|---|
f | +The function to be evaluated. Takes input tensor as first +argument. |
+
output_shape | +Expected output shape from the function (not required +when using TensorFlow back-end). |
+
mask | +mask |
+
arguments | +optional named list of keyword arguments to be passed to the +function. |
+
input_shape | +Dimensionality of the input (integer) not including the +samples axis. This argument is required when using this layer as the first +layer in a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
Arbitrary. Use the keyword argument input_shape (list +of integers, does not include the samples axis) when using this layer as +the first layer in a model.
+ +Arbitrary (based on tensor returned from the function)
+ +Other core layers: layer_activation
,
+ layer_activity_regularization
,
+ layer_dense
, layer_dropout
,
+ layer_flatten
, layer_input
,
+ layer_masking
, layer_permute
,
+ layer_repeat_vector
,
+ layer_reshape
layer_locally_connected_1d()
works similarly to layer_conv_1d()
, except
+that weights are unshared, that is, a different set of filters is applied at
+each different patch of the input.
layer_locally_connected_1d(object, filters, kernel_size, strides = 1L, + padding = "valid", data_format = NULL, activation = NULL, + use_bias = TRUE, kernel_initializer = "glorot_uniform", + bias_initializer = "zeros", kernel_regularizer = NULL, + bias_regularizer = NULL, activity_regularizer = NULL, + kernel_constraint = NULL, bias_constraint = NULL, batch_size = NULL, + name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
filters | +Integer, the dimensionality of the output space (i.e. the +number output of filters in the convolution). |
+
kernel_size | +An integer or list of a single integer, specifying the +length of the 1D convolution window. |
+
strides | +An integer or list of a single integer, specifying the stride
+length of the convolution. Specifying any stride value != 1 is incompatible
+with specifying any |
+
padding | +Currently only supports |
+
data_format | +A string, one of |
+
activation | +Activation function to use. If you don't specify anything,
+no activation is applied (ie. "linear" activation: |
+
use_bias | +Boolean, whether the layer uses a bias vector. |
+
kernel_initializer | +Initializer for the |
+
bias_initializer | +Initializer for the bias vector. |
+
kernel_regularizer | +Regularizer function applied to the |
+
bias_regularizer | +Regularizer function applied to the bias vector. |
+
activity_regularizer | +Regularizer function applied to the output of the +layer (its "activation").. |
+
kernel_constraint | +Constraint function applied to the kernel matrix. |
+
bias_constraint | +Constraint function applied to the bias vector. |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
3D tensor with shape: (batch_size, steps, input_dim)
3D tensor with shape: (batch_size, new_steps, filters)
steps
value might have changed due to padding or strides.
Other locally connected layers: layer_locally_connected_2d
layer_locally_connected_2d
works similarly to layer_conv_2d()
, except
+that weights are unshared, that is, a different set of filters is applied at
+each different patch of the input.
layer_locally_connected_2d(object, filters, kernel_size, strides = c(1L, 1L), + padding = "valid", data_format = NULL, activation = NULL, + use_bias = TRUE, kernel_initializer = "glorot_uniform", + bias_initializer = "zeros", kernel_regularizer = NULL, + bias_regularizer = NULL, activity_regularizer = NULL, + kernel_constraint = NULL, bias_constraint = NULL, batch_size = NULL, + name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
filters | +Integer, the dimensionality of the output space (i.e. the +number output of filters in the convolution). |
+
kernel_size | +An integer or list of 2 integers, specifying the width and +height of the 2D convolution window. Can be a single integer to specify the +same value for all spatial dimensions. |
+
strides | +An integer or list of 2 integers, specifying the strides of
+the convolution along the width and height. Can be a single integer to
+specify the same value for all spatial dimensions. Specifying any stride
+value != 1 is incompatible with specifying any |
+
padding | +Currently only supports |
+
data_format | +A string, one of |
+
activation | +Activation function to use. If you don't specify anything,
+no activation is applied (ie. "linear" activation: |
+
use_bias | +Boolean, whether the layer uses a bias vector. |
+
kernel_initializer | +Initializer for the |
+
bias_initializer | +Initializer for the bias vector. |
+
kernel_regularizer | +Regularizer function applied to the |
+
bias_regularizer | +Regularizer function applied to the bias vector. |
+
activity_regularizer | +Regularizer function applied to the output of the +layer (its "activation").. |
+
kernel_constraint | +Constraint function applied to the kernel matrix. |
+
bias_constraint | +Constraint function applied to the bias vector. |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
4D tensor with shape: (samples, channels, rows, cols)
+if data_format='channels_first' or 4D tensor with shape: (samples, rows, cols, channels)
if data_format='channels_last'.
4D tensor with shape: (samples, filters, new_rows, new_cols)
if data_format='channels_first' or 4D tensor with shape:
+(samples, new_rows, new_cols, filters)
if data_format='channels_last'.
+rows
and cols
values might have changed due to padding.
Other locally connected layers: layer_locally_connected_1d
For a step-by-step description of the algorithm, see thistutorial.
+ + +layer_lstm(object, units, activation = "tanh", + recurrent_activation = "hard_sigmoid", use_bias = TRUE, + return_sequences = FALSE, return_state = FALSE, go_backwards = FALSE, + stateful = FALSE, unroll = FALSE, implementation = 0L, + kernel_initializer = "glorot_uniform", + recurrent_initializer = "orthogonal", bias_initializer = "zeros", + unit_forget_bias = TRUE, kernel_regularizer = NULL, + recurrent_regularizer = NULL, bias_regularizer = NULL, + activity_regularizer = NULL, kernel_constraint = NULL, + recurrent_constraint = NULL, bias_constraint = NULL, dropout = 0, + recurrent_dropout = 0, input_shape = NULL, batch_input_shape = NULL, + batch_size = NULL, dtype = NULL, name = NULL, trainable = NULL, + weights = NULL)+ +
object | +Model or layer object |
+
---|---|
units | +Positive integer, dimensionality of the output space. |
+
activation | +Activation function to use. If you pass |
+
recurrent_activation | +Activation function to use for the recurrent +step. |
+
use_bias | +Boolean, whether the layer uses a bias vector. |
+
return_sequences | +Boolean. Whether to return the last output in the +output sequence, or the full sequence. |
+
return_state | +Boolean (default FALSE). Whether to return the last state +in addition to the output. |
+
go_backwards | +Boolean (default FALSE). If TRUE, process the input +sequence backwards and return the reversed sequence. |
+
stateful | +Boolean (default FALSE). If TRUE, the last state for each +sample at index i in a batch will be used as initial state for the sample +of index i in the following batch. |
+
unroll | +Boolean (default FALSE). If TRUE, the network will be unrolled, +else a symbolic loop will be used. Unrolling can speed-up a RNN, although +it tends to be more memory-intensive. Unrolling is only suitable for short +sequences. |
+
implementation | +one of 0, 1, or 2. If set to 0, the RNN will use an +implementation that uses fewer, larger matrix products, thus running faster +on CPU but consuming more memory. If set to 1, the RNN will use more matrix +products, but smaller ones, thus running slower (may actually be faster on +GPU) while consuming less memory. If set to 2 (LSTM/GRU only), the RNN will +combine the input gate, the forget gate and the output gate into a single +matrix, enabling more time-efficient parallelization on the GPU. |
+
kernel_initializer | +Initializer for the |
+
recurrent_initializer | +Initializer for the |
+
bias_initializer | +Initializer for the bias vector. |
+
unit_forget_bias | +Boolean. If TRUE, add 1 to the bias of the forget
+gate at initialization. Setting it to true will also force
+ |
+
kernel_regularizer | +Regularizer function applied to the |
+
recurrent_regularizer | +Regularizer function applied to the
+ |
+
bias_regularizer | +Regularizer function applied to the bias vector. |
+
activity_regularizer | +Regularizer function applied to the output of the +layer (its "activation").. |
+
kernel_constraint | +Constraint function applied to the |
+
recurrent_constraint | +Constraint function applied to the
+ |
+
bias_constraint | +Constraint function applied to the bias vector. |
+
dropout | +Float between 0 and 1. Fraction of the units to drop for the +linear transformation of the inputs. |
+
recurrent_dropout | +Float between 0 and 1. Fraction of the units to drop +for the linear transformation of the recurrent state. |
+
input_shape | +Dimensionality of the input (integer) not including the +samples axis. This argument is required when using this layer as the first +layer in a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
3D tensor with shape (batch_size, timesteps, input_dim)
,
+(Optional) 2D tensors with shape (batch_size, output_dim)
.
if return_state
: a list of tensors. The first tensor is
+the output. The remaining tensors are the last states,
+each with shape (batch_size, units)
.
if return_sequences
: 3D tensor with shape
+(batch_size, timesteps, units)
.
else, 2D tensor with shape (batch_size, units)
.
This layer supports masking for input data with a variable number
+of timesteps. To introduce masks to your data,
+use an embedding layer with the mask_zero
parameter
+set to TRUE
.
You can set RNN layers to be 'stateful', which means that the states +computed for the samples in one batch will be reused as initial states +for the samples in the next batch. This assumes a one-to-one mapping +between samples in different successive batches.
+To enable statefulness:
Specify stateful=TRUE
in the layer constructor.
Specify a fixed batch size for your model. For sequential models,
+pass batch_input_shape = c(...)
to the first layer in your model.
+For functional models with 1 or more Input layers, pass
+batch_shape = c(...)
to all the first layers in your model.
+This is the expected shape of your inputs including the batch size.
+It should be a vector of integers, e.g. c(32, 10, 100)
.
Specify shuffle = FALSE
when calling fit().
To reset the states of your model, call reset_states()
on either
+a specific layer, or on your entire model.
You can specify the initial state of RNN layers symbolically by calling
+them with the keyword argument initial_state
. The value of
+initial_state
should be a tensor or list of tensors representing
+the initial state of the RNN layer.
You can specify the initial state of RNN layers numerically by
+calling reset_states
with the keyword argument states
. The value of
+states
should be a numpy array or list of numpy arrays representing
+the initial state of the RNN layer.
Long short-term memory (original 1997 paper)
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
Other recurrent layers: layer_gru
,
+ layer_simple_rnn
Other recurrent layers: layer_gru
,
+ layer_simple_rnn
For each timestep in the input tensor (dimension #1 in the tensor), if all
+values in the input tensor at that timestep are equal to mask_value
, then
+the timestep will be masked (skipped) in all downstream layers (as long as
+they support masking). If any downstream layer does not support masking yet
+receives such an input mask, an exception will be raised.
layer_masking(object, mask_value = 0, input_shape = NULL, + batch_input_shape = NULL, batch_size = NULL, dtype = NULL, + name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
mask_value | +float, mask value |
+
input_shape | +Dimensionality of the input (integer) not including the +samples axis. This argument is required when using this layer as the first +layer in a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
Other core layers: layer_activation
,
+ layer_activity_regularization
,
+ layer_dense
, layer_dropout
,
+ layer_flatten
, layer_input
,
+ layer_lambda
, layer_permute
,
+ layer_repeat_vector
,
+ layer_reshape
Max pooling operation for temporal data.
+ + +layer_max_pooling_1d(object, pool_size = 2L, strides = NULL, + padding = "valid", batch_size = NULL, name = NULL, trainable = NULL, + weights = NULL)+ +
object | +Model or layer object |
+
---|---|
pool_size | +Integer, size of the max pooling windows. |
+
strides | +Integer, or NULL. Factor by which to downscale. E.g. 2 will
+halve the input. If NULL, it will default to |
+
padding | +One of |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
3D tensor with shape: (batch_size, steps, features)
.
3D tensor with shape: (batch_size, downsampled_steps, features)
.
Other pooling layers: layer_average_pooling_1d
,
+ layer_average_pooling_2d
,
+ layer_average_pooling_3d
,
+ layer_global_average_pooling_1d
,
+ layer_global_average_pooling_2d
,
+ layer_global_average_pooling_3d
,
+ layer_global_max_pooling_1d
,
+ layer_global_max_pooling_2d
,
+ layer_global_max_pooling_3d
,
+ layer_max_pooling_2d
,
+ layer_max_pooling_3d
Max pooling operation for spatial data.
+ + +layer_max_pooling_2d(object, pool_size = c(2L, 2L), strides = NULL, + padding = "valid", data_format = NULL, batch_size = NULL, name = NULL, + trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
pool_size | +integer or list of 2 integers, factors by which to downscale +(vertical, horizontal). (2, 2) will halve the input in both spatial +dimension. If only one integer is specified, the same window length will be +used for both dimensions. |
+
strides | +Integer, list of 2 integers, or NULL. Strides values. If NULL,
+it will default to |
+
padding | +One of |
+
data_format | +A string, one of |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
If data_format='channels_last'
: 4D tensor with shape: (batch_size, rows, cols, channels)
If data_format='channels_first'
: 4D tensor with shape: (batch_size, channels, rows, cols)
If data_format='channels_last'
: 4D tensor with shape: (batch_size, pooled_rows, pooled_cols, channels)
If data_format='channels_first'
: 4D tensor with shape: (batch_size, channels, pooled_rows, pooled_cols)
Other pooling layers: layer_average_pooling_1d
,
+ layer_average_pooling_2d
,
+ layer_average_pooling_3d
,
+ layer_global_average_pooling_1d
,
+ layer_global_average_pooling_2d
,
+ layer_global_average_pooling_3d
,
+ layer_global_max_pooling_1d
,
+ layer_global_max_pooling_2d
,
+ layer_global_max_pooling_3d
,
+ layer_max_pooling_1d
,
+ layer_max_pooling_3d
Max pooling operation for 3D data (spatial or spatio-temporal).
+ + +layer_max_pooling_3d(object, pool_size = c(2L, 2L, 2L), strides = NULL, + padding = "valid", data_format = NULL, batch_size = NULL, name = NULL, + trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
pool_size | +list of 3 integers, factors by which to downscale (dim1, +dim2, dim3). (2, 2, 2) will halve the size of the 3D input in each +dimension. |
+
strides | +list of 3 integers, or NULL. Strides values. |
+
padding | +One of |
+
data_format | +A string, one of |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
If data_format='channels_last'
: 5D tensor with shape: (batch_size, spatial_dim1, spatial_dim2, spatial_dim3, channels)
If data_format='channels_first'
: 5D tensor with shape: (batch_size, channels, spatial_dim1, spatial_dim2, spatial_dim3)
If data_format='channels_last'
: 5D tensor with shape: (batch_size, pooled_dim1, pooled_dim2, pooled_dim3, channels)
If data_format='channels_first'
: 5D tensor with shape: (batch_size, channels, pooled_dim1, pooled_dim2, pooled_dim3)
Other pooling layers: layer_average_pooling_1d
,
+ layer_average_pooling_2d
,
+ layer_average_pooling_3d
,
+ layer_global_average_pooling_1d
,
+ layer_global_average_pooling_2d
,
+ layer_global_average_pooling_3d
,
+ layer_global_max_pooling_1d
,
+ layer_global_max_pooling_2d
,
+ layer_global_max_pooling_3d
,
+ layer_max_pooling_1d
,
+ layer_max_pooling_2d
It takes as input a list of tensors, all of the same shape, and returns a +single tensor (also of the same shape).
+ + +layer_maximum(inputs)+ +
inputs | +A list of input tensors (at least 2). |
+
---|
A tensor, the element-wise maximum of the inputs.
+ +Other merge layers: layer_add
,
+ layer_average
,
+ layer_concatenate
, layer_dot
,
+ layer_multiply
It takes as input a list of tensors, all of the same shape, and returns a +single tensor (also of the same shape).
+ + +layer_multiply(inputs)+ +
inputs | +A list of input tensors (at least 2). |
+
---|
A tensor, the element-wise product of the inputs.
+ +Other merge layers: layer_add
,
+ layer_average
,
+ layer_concatenate
, layer_dot
,
+ layer_maximum
Permute the dimensions of an input according to a given pattern
+ + +layer_permute(object, dims, input_shape = NULL, batch_input_shape = NULL, + batch_size = NULL, dtype = NULL, name = NULL, trainable = NULL, + weights = NULL)+ +
object | +Model or layer object |
+
---|---|
dims | +List of integers. Permutation pattern, does not include the
+samples dimension. Indexing starts at 1. For instance, |
+
input_shape | +Input shape (list of integers, does not include the +samples axis) which is required when using this layer as the first layer in +a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
Useful for e.g. connecting RNNs and convnets together.
+ +Input shape: Arbitrary
+Output shape: Same as the input shape, but with the dimensions re-ordered +according to the specified pattern.
+ +Other core layers: layer_activation
,
+ layer_activity_regularization
,
+ layer_dense
, layer_dropout
,
+ layer_flatten
, layer_input
,
+ layer_lambda
, layer_masking
,
+ layer_repeat_vector
,
+ layer_reshape
Repeats the input n times.
+ + +layer_repeat_vector(object, n, batch_size = NULL, name = NULL, + trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
n | +integer, repetition factor. |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
2D tensor of shape (num_samples, features)
.
3D tensor of shape (num_samples, n, features)
.
Other core layers: layer_activation
,
+ layer_activity_regularization
,
+ layer_dense
, layer_dropout
,
+ layer_flatten
, layer_input
,
+ layer_lambda
, layer_masking
,
+ layer_permute
, layer_reshape
Reshapes an output to a certain shape.
+ + +layer_reshape(object, target_shape, input_shape = NULL, + batch_input_shape = NULL, batch_size = NULL, dtype = NULL, + name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
target_shape | +List of integers, does not include the samples dimension +(batch size). |
+
input_shape | +Input shape (list of integers, does not include the +samples axis) which is required when using this layer as the first layer in +a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
Input shape: Arbitrary, although all dimensions in the input shaped must be +fixed.
+Output shape: (batch_size,) + target_shape
.
Other core layers: layer_activation
,
+ layer_activity_regularization
,
+ layer_dense
, layer_dropout
,
+ layer_flatten
, layer_input
,
+ layer_lambda
, layer_masking
,
+ layer_permute
,
+ layer_repeat_vector
Separable convolutions consist in first performing a depthwise spatial
+convolution (which acts on each input channel separately) followed by a
+pointwise convolution which mixes together the resulting output channels. The
+depth_multiplier
argument controls how many output channels are generated
+per input channel in the depthwise step. Intuitively, separable convolutions
+can be understood as a way to factorize a convolution kernel into two smaller
+kernels, or as an extreme version of an Inception block.
layer_separable_conv_2d(object, filters, kernel_size, strides = c(1L, 1L), + padding = "valid", data_format = NULL, depth_multiplier = 1L, + activation = NULL, use_bias = TRUE, + depthwise_initializer = "glorot_uniform", + pointwise_initializer = "glorot_uniform", bias_initializer = "zeros", + depthwise_regularizer = NULL, pointwise_regularizer = NULL, + bias_regularizer = NULL, activity_regularizer = NULL, + depthwise_constraint = NULL, pointwise_constraint = NULL, + bias_constraint = NULL, batch_size = NULL, name = NULL, + trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
filters | +Integer, the dimensionality of the output space (i.e. the +number output of filters in the convolution). |
+
kernel_size | +An integer or list of 2 integers, specifying the width and +height of the 2D convolution window. Can be a single integer to specify the +same value for all spatial dimensions. |
+
strides | +An integer or list of 2 integers, specifying the strides of
+the convolution along the width and height. Can be a single integer to
+specify the same value for all spatial dimensions. Specifying any stride
+value != 1 is incompatible with specifying any |
+
padding | +one of |
+
data_format | +A string, one of |
+
depth_multiplier | +The number of depthwise convolution output channels
+for each input channel. The total number of depthwise convolution output
+channels will be equal to |
+
activation | +Activation function to use. If you don't specify anything,
+no activation is applied (ie. "linear" activation: |
+
use_bias | +Boolean, whether the layer uses a bias vector. |
+
depthwise_initializer | +Initializer for the depthwise kernel matrix. |
+
pointwise_initializer | +Initializer for the pointwise kernel matrix. |
+
bias_initializer | +Initializer for the bias vector. |
+
depthwise_regularizer | +Regularizer function applied to the depthwise +kernel matrix. |
+
pointwise_regularizer | +Regularizer function applied to the depthwise +kernel matrix. |
+
bias_regularizer | +Regularizer function applied to the bias vector. |
+
activity_regularizer | +Regularizer function applied to the output of the +layer (its "activation").. |
+
depthwise_constraint | +Constraint function applied to the depthwise +kernel matrix. |
+
pointwise_constraint | +Constraint function applied to the pointwise +kernel matrix. |
+
bias_constraint | +Constraint function applied to the bias vector. |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
4D tensor with shape: (batch, channels, rows, cols)
+if data_format='channels_first' or 4D tensor with shape: (batch, rows, cols, channels)
if data_format='channels_last'.
4D tensor with shape: (batch, filters, new_rows, new_cols)
if data_format='channels_first' or 4D tensor with shape:
+(batch, new_rows, new_cols, filters)
if data_format='channels_last'.
+rows
and cols
values might have changed due to padding.
Other convolutional layers: layer_conv_1d
,
+ layer_conv_2d_transpose
,
+ layer_conv_2d
,
+ layer_conv_3d_transpose
,
+ layer_conv_3d
,
+ layer_conv_lstm_2d
,
+ layer_cropping_1d
,
+ layer_cropping_2d
,
+ layer_cropping_3d
,
+ layer_upsampling_1d
,
+ layer_upsampling_2d
,
+ layer_upsampling_3d
,
+ layer_zero_padding_1d
,
+ layer_zero_padding_2d
,
+ layer_zero_padding_3d
Fully-connected RNN where the output is to be fed back to input.
+ + +layer_simple_rnn(object, units, activation = "tanh", use_bias = TRUE, + return_sequences = FALSE, return_state = FALSE, go_backwards = FALSE, + stateful = FALSE, unroll = FALSE, implementation = 0L, + kernel_initializer = "glorot_uniform", + recurrent_initializer = "orthogonal", bias_initializer = "zeros", + kernel_regularizer = NULL, recurrent_regularizer = NULL, + bias_regularizer = NULL, activity_regularizer = NULL, + kernel_constraint = NULL, recurrent_constraint = NULL, + bias_constraint = NULL, dropout = 0, recurrent_dropout = 0, + input_shape = NULL, batch_input_shape = NULL, batch_size = NULL, + dtype = NULL, name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
units | +Positive integer, dimensionality of the output space. |
+
activation | +Activation function to use. If you pass |
+
use_bias | +Boolean, whether the layer uses a bias vector. |
+
return_sequences | +Boolean. Whether to return the last output in the +output sequence, or the full sequence. |
+
return_state | +Boolean (default FALSE). Whether to return the last state +in addition to the output. |
+
go_backwards | +Boolean (default FALSE). If TRUE, process the input +sequence backwards and return the reversed sequence. |
+
stateful | +Boolean (default FALSE). If TRUE, the last state for each +sample at index i in a batch will be used as initial state for the sample +of index i in the following batch. |
+
unroll | +Boolean (default FALSE). If TRUE, the network will be unrolled, +else a symbolic loop will be used. Unrolling can speed-up a RNN, although +it tends to be more memory-intensive. Unrolling is only suitable for short +sequences. |
+
implementation | +one of 0, 1, or 2. If set to 0, the RNN will use an +implementation that uses fewer, larger matrix products, thus running faster +on CPU but consuming more memory. If set to 1, the RNN will use more matrix +products, but smaller ones, thus running slower (may actually be faster on +GPU) while consuming less memory. If set to 2 (LSTM/GRU only), the RNN will +combine the input gate, the forget gate and the output gate into a single +matrix, enabling more time-efficient parallelization on the GPU. |
+
kernel_initializer | +Initializer for the |
+
recurrent_initializer | +Initializer for the |
+
bias_initializer | +Initializer for the bias vector. |
+
kernel_regularizer | +Regularizer function applied to the |
+
recurrent_regularizer | +Regularizer function applied to the
+ |
+
bias_regularizer | +Regularizer function applied to the bias vector. |
+
activity_regularizer | +Regularizer function applied to the output of the +layer (its "activation").. |
+
kernel_constraint | +Constraint function applied to the |
+
recurrent_constraint | +Constraint function applied to the
+ |
+
bias_constraint | +Constraint function applied to the bias vector. |
+
dropout | +Float between 0 and 1. Fraction of the units to drop for the +linear transformation of the inputs. |
+
recurrent_dropout | +Float between 0 and 1. Fraction of the units to drop +for the linear transformation of the recurrent state. |
+
input_shape | +Dimensionality of the input (integer) not including the +samples axis. This argument is required when using this layer as the first +layer in a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
3D tensor with shape (batch_size, timesteps, input_dim)
,
+(Optional) 2D tensors with shape (batch_size, output_dim)
.
if return_state
: a list of tensors. The first tensor is
+the output. The remaining tensors are the last states,
+each with shape (batch_size, units)
.
if return_sequences
: 3D tensor with shape
+(batch_size, timesteps, units)
.
else, 2D tensor with shape (batch_size, units)
.
This layer supports masking for input data with a variable number
+of timesteps. To introduce masks to your data,
+use an embedding layer with the mask_zero
parameter
+set to TRUE
.
You can set RNN layers to be 'stateful', which means that the states +computed for the samples in one batch will be reused as initial states +for the samples in the next batch. This assumes a one-to-one mapping +between samples in different successive batches.
+To enable statefulness:
Specify stateful=TRUE
in the layer constructor.
Specify a fixed batch size for your model. For sequential models,
+pass batch_input_shape = c(...)
to the first layer in your model.
+For functional models with 1 or more Input layers, pass
+batch_shape = c(...)
to all the first layers in your model.
+This is the expected shape of your inputs including the batch size.
+It should be a vector of integers, e.g. c(32, 10, 100)
.
Specify shuffle = FALSE
when calling fit().
To reset the states of your model, call reset_states()
on either
+a specific layer, or on your entire model.
You can specify the initial state of RNN layers symbolically by calling
+them with the keyword argument initial_state
. The value of
+initial_state
should be a tensor or list of tensors representing
+the initial state of the RNN layer.
You can specify the initial state of RNN layers numerically by
+calling reset_states
with the keyword argument states
. The value of
+states
should be a numpy array or list of numpy arrays representing
+the initial state of the RNN layer.
Other recurrent layers: layer_gru
,
+ layer_lstm
This version performs the same function as Dropout, however it drops entire
+1D feature maps instead of individual elements. If adjacent frames within
+feature maps are strongly correlated (as is normally the case in early
+convolution layers) then regular dropout will not regularize the activations
+and will otherwise just result in an effective learning rate decrease. In
+this case, layer_spatial_dropout_1d
will help promote independence between
+feature maps and should be used instead.
layer_spatial_dropout_1d(object, rate, batch_size = NULL, name = NULL, + trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
rate | +float between 0 and 1. Fraction of the input units to drop. |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
3D tensor with shape: (samples, timesteps, channels)
Same as input
+ +- Efficient Object Localization Using ConvolutionalNetworks
+ +Other dropout layers: layer_dropout
,
+ layer_spatial_dropout_2d
,
+ layer_spatial_dropout_3d
This version performs the same function as Dropout, however it drops entire
+2D feature maps instead of individual elements. If adjacent pixels within
+feature maps are strongly correlated (as is normally the case in early
+convolution layers) then regular dropout will not regularize the activations
+and will otherwise just result in an effective learning rate decrease. In
+this case, layer_spatial_dropout_2d
will help promote independence between
+feature maps and should be used instead.
layer_spatial_dropout_2d(object, rate, data_format = NULL, + batch_size = NULL, name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
rate | +float between 0 and 1. Fraction of the input units to drop. |
+
data_format | +'channels_first' or 'channels_last'. In 'channels_first'
+mode, the channels dimension (the depth) is at index 1, in 'channels_last'
+mode is it at index 3. It defaults to the |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
4D tensor with shape: (samples, channels, rows, cols)
+if data_format='channels_first' or 4D tensor with shape: (samples, rows, cols, channels)
if data_format='channels_last'.
Same as input
+ +- Efficient Object Localization Using ConvolutionalNetworks
+ +Other dropout layers: layer_dropout
,
+ layer_spatial_dropout_1d
,
+ layer_spatial_dropout_3d
This version performs the same function as Dropout, however it drops entire
+3D feature maps instead of individual elements. If adjacent voxels within
+feature maps are strongly correlated (as is normally the case in early
+convolution layers) then regular dropout will not regularize the activations
+and will otherwise just result in an effective learning rate decrease. In
+this case, layer_spatial_dropout_3d
will help promote independence between
+feature maps and should be used instead.
layer_spatial_dropout_3d(object, rate, data_format = NULL, + batch_size = NULL, name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
rate | +float between 0 and 1. Fraction of the input units to drop. |
+
data_format | +'channels_first' or 'channels_last'. In 'channels_first'
+mode, the channels dimension (the depth) is at index 1, in 'channels_last'
+mode is it at index 4. It defaults to the |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
5D tensor with shape: (samples, channels, dim1, dim2, dim3)
if data_format='channels_first' or 5D tensor with shape: (samples, dim1, dim2, dim3, channels)
if data_format='channels_last'.
Same as input
+ +- Efficient Object Localization Using ConvolutionalNetworks
+ +Other dropout layers: layer_dropout
,
+ layer_spatial_dropout_1d
,
+ layer_spatial_dropout_2d
Repeats each temporal step size
times along the time axis.
layer_upsampling_1d(object, size = 2L, batch_size = NULL, name = NULL, + trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
size | +integer. Upsampling factor. |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
3D tensor with shape: (batch, steps, features)
.
3D tensor with shape: (batch, upsampled_steps, features)
.
Other convolutional layers: layer_conv_1d
,
+ layer_conv_2d_transpose
,
+ layer_conv_2d
,
+ layer_conv_3d_transpose
,
+ layer_conv_3d
,
+ layer_conv_lstm_2d
,
+ layer_cropping_1d
,
+ layer_cropping_2d
,
+ layer_cropping_3d
,
+ layer_separable_conv_2d
,
+ layer_upsampling_2d
,
+ layer_upsampling_3d
,
+ layer_zero_padding_1d
,
+ layer_zero_padding_2d
,
+ layer_zero_padding_3d
Repeats the rows and columns of the data by size[[0]]
and size[[1]]
respectively.
[[0]: R:[0 +[[1]: R:[1
+ + +layer_upsampling_2d(object, size = c(2L, 2L), data_format = NULL, + batch_size = NULL, name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
size | +int, or list of 2 integers. The upsampling factors for rows and +columns. |
+
data_format | +A string, one of |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
4D tensor with shape:
If data_format
is "channels_last"
: (batch, rows, cols, channels)
If data_format
is "channels_first"
: (batch, channels, rows, cols)
4D tensor with shape:
If data_format
is "channels_last"
: (batch, upsampled_rows, upsampled_cols, channels)
If data_format
is "channels_first"
: (batch, channels, upsampled_rows, upsampled_cols)
Other convolutional layers: layer_conv_1d
,
+ layer_conv_2d_transpose
,
+ layer_conv_2d
,
+ layer_conv_3d_transpose
,
+ layer_conv_3d
,
+ layer_conv_lstm_2d
,
+ layer_cropping_1d
,
+ layer_cropping_2d
,
+ layer_cropping_3d
,
+ layer_separable_conv_2d
,
+ layer_upsampling_1d
,
+ layer_upsampling_3d
,
+ layer_zero_padding_1d
,
+ layer_zero_padding_2d
,
+ layer_zero_padding_3d
Repeats the 1st, 2nd and 3rd dimensions of the data by size[[0]]
, size[[1]]
and
+size[[2]]
respectively.
[[0]: R:[0 +[[1]: R:[1 +[[2]: R:[2
+ + +layer_upsampling_3d(object, size = c(2L, 2L, 2L), data_format = NULL, + batch_size = NULL, name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
size | +int, or list of 3 integers. The upsampling factors for dim1, dim2 +and dim3. |
+
data_format | +A string, one of |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
5D tensor with shape:
If data_format
is "channels_last"
: (batch, dim1, dim2, dim3, channels)
If data_format
is "channels_first"
: (batch, channels, dim1, dim2, dim3)
5D tensor with shape:
If data_format
is "channels_last"
: (batch, upsampled_dim1, upsampled_dim2, upsampled_dim3, channels)
If data_format
is "channels_first"
: (batch, channels, upsampled_dim1, upsampled_dim2, upsampled_dim3)
Other convolutional layers: layer_conv_1d
,
+ layer_conv_2d_transpose
,
+ layer_conv_2d
,
+ layer_conv_3d_transpose
,
+ layer_conv_3d
,
+ layer_conv_lstm_2d
,
+ layer_cropping_1d
,
+ layer_cropping_2d
,
+ layer_cropping_3d
,
+ layer_separable_conv_2d
,
+ layer_upsampling_1d
,
+ layer_upsampling_2d
,
+ layer_zero_padding_1d
,
+ layer_zero_padding_2d
,
+ layer_zero_padding_3d
Zero-padding layer for 1D input (e.g. temporal sequence).
+ + +layer_zero_padding_1d(object, padding = 1L, batch_size = NULL, + name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
padding | +int, or list of int (length 2)
|
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
3D tensor with shape (batch, axis_to_pad, features)
3D tensor with shape (batch, padded_axis, features)
Other convolutional layers: layer_conv_1d
,
+ layer_conv_2d_transpose
,
+ layer_conv_2d
,
+ layer_conv_3d_transpose
,
+ layer_conv_3d
,
+ layer_conv_lstm_2d
,
+ layer_cropping_1d
,
+ layer_cropping_2d
,
+ layer_cropping_3d
,
+ layer_separable_conv_2d
,
+ layer_upsampling_1d
,
+ layer_upsampling_2d
,
+ layer_upsampling_3d
,
+ layer_zero_padding_2d
,
+ layer_zero_padding_3d
This layer can add rows and columns of zeros at the top, bottom, left and +right side of an image tensor.
+ + +layer_zero_padding_2d(object, padding = c(1L, 1L), data_format = NULL, + batch_size = NULL, name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
padding | +int, or list of 2 ints, or list of 2 lists of 2 ints.
|
+
data_format | +A string, one of |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
4D tensor with shape:
If data_format
is "channels_last"
: (batch, rows, cols, channels)
If data_format
is "channels_first"
: (batch, channels, rows, cols)
4D tensor with shape:
If data_format
is "channels_last"
: (batch, padded_rows, padded_cols, channels)
If data_format
is "channels_first"
: (batch, channels, padded_rows, padded_cols)
Other convolutional layers: layer_conv_1d
,
+ layer_conv_2d_transpose
,
+ layer_conv_2d
,
+ layer_conv_3d_transpose
,
+ layer_conv_3d
,
+ layer_conv_lstm_2d
,
+ layer_cropping_1d
,
+ layer_cropping_2d
,
+ layer_cropping_3d
,
+ layer_separable_conv_2d
,
+ layer_upsampling_1d
,
+ layer_upsampling_2d
,
+ layer_upsampling_3d
,
+ layer_zero_padding_1d
,
+ layer_zero_padding_3d
Zero-padding layer for 3D data (spatial or spatio-temporal).
+ + +layer_zero_padding_3d(object, padding = c(1L, 1L, 1L), data_format = NULL, + batch_size = NULL, name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
padding | +int, or list of 3 ints, or list of 3 lists of 2 ints.
|
+
data_format | +A string, one of |
+
batch_size | +Fixed batch size for layer |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
5D tensor with shape:
If data_format
is "channels_last"
: (batch, first_axis_to_pad, second_axis_to_pad, third_axis_to_pad, depth)
If data_format
is "channels_first"
: (batch, depth, first_axis_to_pad, second_axis_to_pad, third_axis_to_pad)
5D tensor with shape:
If data_format
is "channels_last"
: (batch, first_padded_axis, second_padded_axis, third_axis_to_pad, depth)
If data_format
is "channels_first"
: (batch, depth, first_padded_axis, second_padded_axis, third_axis_to_pad)
Other convolutional layers: layer_conv_1d
,
+ layer_conv_2d_transpose
,
+ layer_conv_2d
,
+ layer_conv_3d_transpose
,
+ layer_conv_3d
,
+ layer_conv_lstm_2d
,
+ layer_cropping_1d
,
+ layer_cropping_2d
,
+ layer_cropping_3d
,
+ layer_separable_conv_2d
,
+ layer_upsampling_1d
,
+ layer_upsampling_2d
,
+ layer_upsampling_3d
,
+ layer_zero_padding_1d
,
+ layer_zero_padding_2d
Model loss functions
+ + +loss_mean_squared_error(y_true, y_pred) + +loss_mean_absolute_error(y_true, y_pred) + +loss_mean_absolute_percentage_error(y_true, y_pred) + +loss_mean_squared_logarithmic_error(y_true, y_pred) + +loss_squared_hinge(y_true, y_pred) + +loss_hinge(y_true, y_pred) + +loss_categorical_hinge(y_true, y_pred) + +loss_logcosh(y_true, y_pred) + +loss_categorical_crossentropy(y_true, y_pred) + +loss_sparse_categorical_crossentropy(y_true, y_pred) + +loss_binary_crossentropy(y_true, y_pred) + +loss_kullback_leibler_divergence(y_true, y_pred) + +loss_poisson(y_true, y_pred) + +loss_cosine_proximity(y_true, y_pred)+ +
y_true | +True labels (Tensor) |
+
---|---|
y_pred | +Predictions (Tensor of the same shape as |
+
Loss functions are to be supplied in the loss
parameter of the
+compile()
function.
Loss functions can be specified either using the name of a built in loss +function (e.g. 'loss = binary_crossentropy'), a reference to a built in loss +function (e.g. 'loss = loss_binary_crossentropy()') or by passing an +artitrary function that returns a scalar for each data-point and takes the +following two arguments:
y_true
True labels (Tensor)
y_pred
Predictions (Tensor of the same shape as y_true
)
The actual optimized objective is the mean of the output array across all +datapoints.
+ +When using the categorical_crossentropy loss, your targets should be in
+categorical format (e.g. if you have 10 classes, the target for each sample
+should be a 10-dimensional vector that is all-zeros expect for a 1 at the
+index corresponding to the class of the sample). In order to convert
+integer targets into categorical targets, you can use the Keras utility
+function to_categorical()
:
+ categorical_labels <- to_categorical(int_labels, num_classes = NULL)
This generates an array where the ith element is the probability that a word +of rank i would be sampled, according to the sampling distribution used in +word2vec. The word2vec formula is: p(word) = min(1, +sqrt(word.frequency/sampling_factor) / (word.frequency/sampling_factor)) We +assume that the word frequencies follow Zipf's law (s=1) to derive a +numerical approximation of frequency(rank): frequency(rank) ~ 1/(rank * +(log(rank) + gamma) + 1/2 - 1/(12*rank)) where gamma is the Euler-Mascheroni +constant.
+ + +make_sampling_table(size, sampling_factor = 1e-05)+ +
size | +int, number of possible words to sample. |
+
---|---|
sampling_factor | +the sampling factor in the word2vec formula. |
+
An array of length size
where the ith entry is the
+probability that a word of rank i should be sampled.
The word2vec formula is: p(word) = min(1, +sqrt(word.frequency/sampling_factor) / (word.frequency/sampling_factor))
+ +Other text preprocessing: pad_sequences
,
+ skipgrams
,
+ text_hashing_trick
,
+ text_one_hot
,
+ text_to_word_sequence
Model performance metrics
+ + +metric_binary_accuracy(y_true, y_pred) + +metric_binary_crossentropy(y_true, y_pred) + +metric_categorical_accuracy(y_true, y_pred) + +metric_categorical_crossentropy(y_true, y_pred) + +metric_cosine_proximity(y_true, y_pred) + +metric_hinge(y_true, y_pred) + +metric_kullback_leibler_divergence(y_true, y_pred) + +metric_mean_absolute_error(y_true, y_pred) + +metric_mean_absolute_percentage_error(y_true, y_pred) + +metric_mean_squared_error(y_true, y_pred) + +metric_mean_squared_logarithmic_error(y_true, y_pred) + +metric_poisson(y_true, y_pred) + +metric_sparse_categorical_crossentropy(y_true, y_pred) + +metric_squared_hinge(y_true, y_pred) + +metric_top_k_categorical_accuracy(y_true, y_pred, k = 5) + +metric_sparse_top_k_categorical_accuracy(y_true, y_pred, k = 5)+ +
y_true | +True labels (tensor) |
+
---|---|
y_pred | +Predictions (tensor of the same shape as y_true). |
+
k | +An integer, number of top elements to consider. |
+
Metric functions are to be supplied in the metrics
parameter of the
+compile()
function.
You can provide an arbitrary R function as a custom metric. Note that
+the y_true
and y_pred
parameters are tensors, so computations on
+them should use backend tensor functions. For example:
# create metric using backend tensor functions +K <- backend() +metric_mean_pred <- function(y_true, y_pred) { + K$mean(y_pred) +} + model + optimizer = optimizer_rmsprop(), + loss = loss_binary_crossentropy, + metrics = c('accuracy', + 'mean_pred' = metric_mean_pred) +) ++
Note that a name ('mean_pred') is provided for the custom metric +function. This name is used within training progress output.
+Documentation on the available backend tensor functions can be +found at https://rstudio.github.io/keras/articles/backend.html#backend-functions.
+ + +Save and re-load models configurations as JSON. Note that the representation +does not include the weights, only the architecture.
+ + +model_to_json(object) + +model_from_json(json, custom_objects = NULL)+ +
object | +Model object to save |
+
---|---|
json | +JSON with model configuration |
+
custom_objects | +Optional named list mapping names to custom classes or +functions to be considered during deserialization. |
+
Other model persistence: get_weights
,
+ model_to_yaml
,
+ save_model_hdf5
,
+ save_model_weights_hdf5
,
+ serialize_model
Save and re-load models configurations as YAML Note that the representation +does not include the weights, only the architecture.
+ + +model_to_yaml(object) + +model_from_yaml(yaml, custom_objects = NULL)+ +
object | +Model object to save |
+
---|---|
yaml | +YAML with model configuration |
+
custom_objects | +Optional named list mapping names to custom classes or +functions to be considered during deserialization. |
+
Other model persistence: get_weights
,
+ model_to_json
,
+ save_model_hdf5
,
+ save_model_weights_hdf5
,
+ serialize_model
Normalize a matrix or nd-array
+ + +normalize(x, axis = -1, order = 2)+ +
x | +Matrix or array to normalize |
+
---|---|
axis | +Axis along which to normalize |
+
order | +Normalization order (e.g. 2 for L2 norm) |
+
A normalized copy of the array.
+ + +Adadelta optimizer as described in ADADELTA: An Adaptive Learning RateMethod.
+ + +optimizer_adadelta(lr = 1, rho = 0.95, epsilon = 1e-08, decay = 0, + clipnorm = NULL, clipvalue = NULL)+ +
lr | +float >= 0. Learning rate. |
+
---|---|
rho | +float >= 0. Decay factor. |
+
epsilon | +float >= 0. Fuzz factor. |
+
decay | +float >= 0. Learning rate decay over each update. |
+
clipnorm | +Gradients will be clipped when their L2 norm exceeds this +value. |
+
clipvalue | +Gradients will be clipped when their absolute value exceeds +this value. |
+
It is recommended to leave the parameters of this optimizer at their +default values.
+ +Other optimizers: optimizer_adagrad
,
+ optimizer_adamax
,
+ optimizer_adam
,
+ optimizer_nadam
,
+ optimizer_rmsprop
,
+ optimizer_sgd
Adagrad optimizer as described in Adaptive Subgradient Methods for OnlineLearning and StochasticOptimization.
+ + +optimizer_adagrad(lr = 0.01, epsilon = 1e-08, decay = 0, + clipnorm = NULL, clipvalue = NULL)+ +
lr | +float >= 0. Learning rate. |
+
---|---|
epsilon | +float >= 0. Fuzz factor. |
+
decay | +float >= 0. Learning rate decay over each update. |
+
clipnorm | +Gradients will be clipped when their L2 norm exceeds this +value. |
+
clipvalue | +Gradients will be clipped when their absolute value exceeds +this value. |
+
It is recommended to leave the parameters of this optimizer at their +default values.
+ +Other optimizers: optimizer_adadelta
,
+ optimizer_adamax
,
+ optimizer_adam
,
+ optimizer_nadam
,
+ optimizer_rmsprop
,
+ optimizer_sgd
Adam optimizer as described in Adam - A Method for StochasticOptimization.
+ + +optimizer_adam(lr = 0.001, beta_1 = 0.9, beta_2 = 0.999, + epsilon = 1e-08, decay = 0, clipnorm = NULL, clipvalue = NULL)+ +
lr | +float >= 0. Learning rate. |
+
---|---|
beta_1 | +The exponential decay rate for the 1st moment estimates. float, +0 < beta < 1. Generally close to 1. |
+
beta_2 | +The exponential decay rate for the 2nd moment estimates. float, +0 < beta < 1. Generally close to 1. |
+
epsilon | +float >= 0. Fuzz factor. |
+
decay | +float >= 0. Learning rate decay over each update. |
+
clipnorm | +Gradients will be clipped when their L2 norm exceeds this +value. |
+
clipvalue | +Gradients will be clipped when their absolute value exceeds +this value. |
+
Default parameters follow those provided in the original paper.
+ +Other optimizers: optimizer_adadelta
,
+ optimizer_adagrad
,
+ optimizer_adamax
,
+ optimizer_nadam
,
+ optimizer_rmsprop
,
+ optimizer_sgd
Adamax optimizer from Section 7 of the Adam paper. +It is a variant of Adam based on the infinity norm.
+ + +optimizer_adamax(lr = 0.002, beta_1 = 0.9, beta_2 = 0.999, + epsilon = 1e-08, decay = 0, clipnorm = NULL, clipvalue = NULL)+ +
lr | +float >= 0. Learning rate. |
+
---|---|
beta_1 | +The exponential decay rate for the 1st moment estimates. float, +0 < beta < 1. Generally close to 1. |
+
beta_2 | +The exponential decay rate for the 2nd moment estimates. float, +0 < beta < 1. Generally close to 1. |
+
epsilon | +float >= 0. Fuzz factor. |
+
decay | +float >= 0. Learning rate decay over each update. |
+
clipnorm | +Gradients will be clipped when their L2 norm exceeds this +value. |
+
clipvalue | +Gradients will be clipped when their absolute value exceeds +this value. |
+
Other optimizers: optimizer_adadelta
,
+ optimizer_adagrad
,
+ optimizer_adam
,
+ optimizer_nadam
,
+ optimizer_rmsprop
,
+ optimizer_sgd
Much like Adam is essentially RMSprop with momentum, Nadam is Adam RMSprop +with Nesterov momentum. See Incorporating Nesterov Momentum intoAdam.
+ + +optimizer_nadam(lr = 0.002, beta_1 = 0.9, beta_2 = 0.999, + epsilon = 1e-08, schedule_decay = 0.004, clipnorm = NULL, + clipvalue = NULL)+ +
lr | +float >= 0. Learning rate. |
+
---|---|
beta_1 | +The exponential decay rate for the 1st moment estimates. float, +0 < beta < 1. Generally close to 1. |
+
beta_2 | +The exponential decay rate for the 2nd moment estimates. float, +0 < beta < 1. Generally close to 1. |
+
epsilon | +float >= 0. Fuzz factor. |
+
schedule_decay | +Schedule deacy. |
+
clipnorm | +Gradients will be clipped when their L2 norm exceeds this +value. |
+
clipvalue | +Gradients will be clipped when their absolute value exceeds +this value. |
+
Default parameters follow those provided in the paper. It is +recommended to leave the parameters of this optimizer at their default +values.
+ +On the importance of initialization and momentum in deeplearning.
+Other optimizers: optimizer_adadelta
,
+ optimizer_adagrad
,
+ optimizer_adamax
,
+ optimizer_adam
,
+ optimizer_rmsprop
,
+ optimizer_sgd
RMSProp optimizer
+ + +optimizer_rmsprop(lr = 0.001, rho = 0.9, epsilon = 1e-08, decay = 0, + clipnorm = NULL, clipvalue = NULL)+ +
lr | +float >= 0. Learning rate. |
+
---|---|
rho | +float >= 0. Decay factor. |
+
epsilon | +float >= 0. Fuzz factor. |
+
decay | +float >= 0. Learning rate decay over each update. |
+
clipnorm | +Gradients will be clipped when their L2 norm exceeds this +value. |
+
clipvalue | +Gradients will be clipped when their absolute value exceeds +this value. |
+
It is recommended to leave the parameters of this optimizer at their +default values (except the learning rate, which can be freely tuned).
+This optimizer is usually a good choice for recurrent neural networks.
+ +Other optimizers: optimizer_adadelta
,
+ optimizer_adagrad
,
+ optimizer_adamax
,
+ optimizer_adam
,
+ optimizer_nadam
,
+ optimizer_sgd
Stochastic gradient descent optimizer with support for momentum, learning +rate decay, and Nesterov momentum.
+ + +optimizer_sgd(lr = 0.01, momentum = 0, decay = 0, nesterov = FALSE, + clipnorm = NULL, clipvalue = NULL)+ +
lr | +float >= 0. Learning rate. |
+
---|---|
momentum | +float >= 0. Parameter updates momentum. |
+
decay | +float >= 0. Learning rate decay over each update. |
+
nesterov | +boolean. Whether to apply Nesterov momentum. |
+
clipnorm | +Gradients will be clipped when their L2 norm exceeds this +value. |
+
clipvalue | +Gradients will be clipped when their absolute value exceeds +this value. |
+
Optimizer for use with compile
.
Other optimizers: optimizer_adadelta
,
+ optimizer_adagrad
,
+ optimizer_adamax
,
+ optimizer_adam
,
+ optimizer_nadam
,
+ optimizer_rmsprop
Pads each sequence to the same length (length of the longest sequence).
+ + +pad_sequences(sequences, maxlen = NULL, dtype = "int32", padding = "pre", + truncating = "pre", value = 0)+ +
sequences | +List of lists where each element is a sequence |
+
---|---|
maxlen | +int, maximum length |
+
dtype | +type to cast the resulting sequence. |
+
padding | +'pre' or 'post', pad either before or after each sequence. |
+
truncating | +'pre' or 'post', remove values from sequences larger than maxlen either in the beginning or in the end of the sequence |
+
value | +float, value to pad the sequences to the desired value. |
+
Array with dimensions (number_of_sequences, maxlen)
+ +If maxlen is provided, any sequence longer than maxlen is truncated to maxlen. +Truncation happens off either the beginning (default) or +the end of the sequence. Supports post-padding and pre-padding (default).
+ +Other text preprocessing: make_sampling_table
,
+ skipgrams
,
+ text_hashing_trick
,
+ text_one_hot
,
+ text_to_word_sequence
See %>%
for more details.
lhs %>% rhs+ + +
Plots metrics recorded during training.
+ + +# S3 method for keras_training_history +plot(x, y, metrics = NULL, + method = c("auto", "ggplot2", "base"), smooth = TRUE, ...)+ +
x | +Training history object returned from |
+
---|---|
y | +Unused. |
+
metrics | +One or more metrics to plot (e.g. |
+
method | +Method to use for plotting. The default "auto" will use +ggplot2 if available, and otherwise will use base graphics. |
+
smooth | +Whether a loess smooth should be added to the plot, only
+available for the |
+
... | +Additional parameters to pass to the |
+
Remove the last layer in a model
+ + +pop_layer(object)+ +
object | +Keras model object |
+
---|
Other model functions: compile
,
+ evaluate_generator
, evaluate
,
+ fit_generator
, fit
,
+ get_config
, get_layer
,
+ keras_model_sequential
,
+ keras_model
,
+ predict.keras.engine.training.Model
,
+ predict_generator
,
+ predict_on_batch
,
+ predict_proba
,
+ summary.keras.engine.training.Model
,
+ train_on_batch
Generates output predictions for the input samples, processing the samples in +a batched way.
+ + +# S3 method for keras.engine.training.Model +predict(object, x, batch_size = 32, + verbose = 0, ...)+ +
object | +Keras model |
+
---|---|
x | +Input data (vector, matrix, or array) |
+
batch_size | +Integer |
+
verbose | +Verbosity mode, 0 or 1. |
+
... | +Unused |
+
vector, matrix, or array of predictions
+ +Other model functions: compile
,
+ evaluate_generator
, evaluate
,
+ fit_generator
, fit
,
+ get_config
, get_layer
,
+ keras_model_sequential
,
+ keras_model
, pop_layer
,
+ predict_generator
,
+ predict_on_batch
,
+ predict_proba
,
+ summary.keras.engine.training.Model
,
+ train_on_batch
The generator should return the same kind of data as accepted by
+predict_on_batch()
.
predict_generator(object, generator, steps, max_queue_size = 10, + verbose = 0)+ +
object | +Keras model object |
+
---|---|
generator | +Generator yielding batches of input samples. |
+
steps | +Total number of steps (batches of samples) to yield from
+ |
+
max_queue_size | +Maximum size for the generator queue. |
+
verbose | +verbosity mode, 0 or 1. |
+
Numpy array(s) of predictions.
+ +ValueError: In case the generator yields data in an invalid +format.
+ +Other model functions: compile
,
+ evaluate_generator
, evaluate
,
+ fit_generator
, fit
,
+ get_config
, get_layer
,
+ keras_model_sequential
,
+ keras_model
, pop_layer
,
+ predict.keras.engine.training.Model
,
+ predict_on_batch
,
+ predict_proba
,
+ summary.keras.engine.training.Model
,
+ train_on_batch
Returns predictions for a single batch of samples.
+ + +predict_on_batch(object, x)+ +
object | +Keras model object |
+
---|---|
x | +Input data (vector, matrix, or array) |
+
array of predictions.
+ +Other model functions: compile
,
+ evaluate_generator
, evaluate
,
+ fit_generator
, fit
,
+ get_config
, get_layer
,
+ keras_model_sequential
,
+ keras_model
, pop_layer
,
+ predict.keras.engine.training.Model
,
+ predict_generator
,
+ predict_proba
,
+ summary.keras.engine.training.Model
,
+ train_on_batch
Generates probability or class probability predictions for the input samples.
+ + +predict_proba(object, x, batch_size = 32, verbose = 0) + +predict_classes(object, x, batch_size = 32, verbose = 0)+ +
object | +Keras model object |
+
---|---|
x | +Input data (vector, matrix, or array) |
+
batch_size | +Integer |
+
verbose | +Verbosity mode, 0 or 1. |
+
The input samples are processed batch by batch.
+ +Other model functions: compile
,
+ evaluate_generator
, evaluate
,
+ fit_generator
, fit
,
+ get_config
, get_layer
,
+ keras_model_sequential
,
+ keras_model
, pop_layer
,
+ predict.keras.engine.training.Model
,
+ predict_generator
,
+ predict_on_batch
,
+ summary.keras.engine.training.Model
,
+ train_on_batch
These objects are imported from other packages. Follow the links +below to see their documentation.
flags
, flag_numeric
, flag_integer
, flag_string
, flag_boolean
, run_dir
Reset the states for a layer
+ + +reset_states(object)+ +
object | +Model or layer object |
+
---|
Other layer methods: count_params
,
+ get_config
, get_input_at
,
+ get_weights
Save/Load models using HDF5 files
+ + +save_model_hdf5(object, filepath, overwrite = TRUE, + include_optimizer = TRUE) + +load_model_hdf5(filepath, custom_objects = NULL, compile = TRUE)+ +
object | +Model object to save |
+
---|---|
filepath | +File path |
+
overwrite | +Overwrite existing file if necessary |
+
include_optimizer | +If |
+
custom_objects | +Mapping class names (or function names) of custom +(non-Keras) objects to class/functions |
+
compile | +Whether to compile the model after loading. |
+
The following components of the model are saved:
The model architecture, allowing to re-instantiate the model.
The model weights.
The state of the optimizer, allowing to resume training exactly where you +left off. +This allows you to save the entirety of the state of a model +in a single file.
Saved models can be reinstantiated via load_model_hdf5()
. The model returned by
+load_model_hdf5()
is a compiled model ready to be used (unless the saved model
+was never compiled in the first place or compile = FALSE
is specified.
The serialize_model()
function enables saving Keras models to
+R objects that can be persisted across R sessions.
Other model persistence: get_weights
,
+ model_to_json
, model_to_yaml
,
+ save_model_weights_hdf5
,
+ serialize_model
Save/Load model weights using HDF5 files
+ + +save_model_weights_hdf5(object, filepath, overwrite = TRUE) + +load_model_weights_hdf5(object, filepath, by_name = FALSE)+ +
object | +Model object to save/load |
+
---|---|
filepath | +Path to the file |
+
overwrite | +Whether to silently overwrite any existing +file at the target location |
+
by_name | +Whether to load weights by name or by topological order. |
+
The weight file has:
layer_names
(attribute), a list of strings (ordered names of model layers).
For every layer, a group
named layer.name
For every such layer group, a group attribute weight_names
, a list of strings
+(ordered names of weights tensor of the layer).
For every weight in the layer, a dataset storing the weight value, named after +the weight tensor.
For load_model_weights()
, if by_name
is FALSE
(default) weights are
+loaded based on the network's topology, meaning the architecture should be
+the same as when the weights were saved. Note that layers that don't have
+weights are not taken into account in the topological ordering, so adding
+or removing layers is fine as long as they don't have weights.
If by_name
is TRUE
, weights are loaded into layers only if they share
+the same name. This is useful for fine-tuning or transfer-learning models
+where some of the layers have changed.
Other model persistence: get_weights
,
+ model_to_json
, model_to_yaml
,
+ save_model_hdf5
,
+ serialize_model
Convert a list of sequences into a matrix.
+ + +sequences_to_matrix(tokenizer, sequences, mode = c("binary", "count", "tfidf", + "freq"))+ +
tokenizer | +Tokenizer |
+
---|---|
sequences | +List of sequences (a sequence is a list of integer word indices). |
+
mode | +one of "binary", "count", "tfidf", "freq". |
+
A matrix
+ +Other text tokenization: fit_text_tokenizer
,
+ text_tokenizer
,
+ texts_to_matrix
,
+ texts_to_sequences_generator
,
+ texts_to_sequences
Model objects are external references to Keras objects which cannot be saved
+and restored across R sessions. The serialize_model()
and
+unserialize_model()
functions provide facilities to convert Keras models to
+R objects for persistence within R data files.
serialize_model(model, include_optimizer = TRUE) + +unserialize_model(model, custom_objects = NULL, compile = TRUE)+ +
model | +Keras model or R "raw" object containing serialized Keras model. |
+
---|---|
include_optimizer | +If |
+
custom_objects | +Mapping class names (or function names) of custom +(non-Keras) objects to class/functions |
+
compile | +Whether to compile the model after loading. |
+
serialize_model()
returns an R "raw" object containing an hdf5
+version of the Keras model. unserialize_model()
returns a Keras model.
The save_model_hdf5()
function enables saving Keras models to
+external hdf5 files.
Other model persistence: get_weights
,
+ model_to_json
, model_to_yaml
,
+ save_model_hdf5
,
+ save_model_weights_hdf5
Takes a sequence (list of indexes of words), returns list of couples
(word_index,
+other_word index) and labels
(1s or 0s), where label = 1 if 'other_word'
+belongs to the context of 'word', and label=0 if 'other_word' is randomly
+sampled
skipgrams(sequence, vocabulary_size, window_size = 4, negative_samples = 1, + shuffle = TRUE, categorical = FALSE, sampling_table = NULL)+ +
sequence | +a word sequence (sentence), encoded as a list of word indices
+(integers). If using a |
+
---|---|
vocabulary_size | +int. maximum possible word index + 1 |
+
window_size | +int. actually half-window. The window of a word wi will be
+ |
+
negative_samples | +float >= 0. 0 for no negative (=random) samples. 1 +for same number as positive samples. etc. |
+
shuffle | +whether to shuffle the word couples before returning them. |
+
categorical | +bool. if FALSE, labels will be integers (eg. [[1,0]: R:[1,0 +[0,1]: R:0,1 +[0,1]: R:0,1 |
+
sampling_table | +1D array of size |
+
List of couples
, labels
where:
couples
is a list of 2-element integer vectors: [word_index, other_word_index]
.
labels
is an integer vector of 0 and 1, where 1 indicates that other_word_index
+was found in the same window as word_index
, and 0 indicates that other_word_index
+was random.
if categorical
is set to TRUE
, the labels are categorical, ie. 1 becomes [0,1]
,
+and 0 becomes [1, 0]
.
Other text preprocessing: make_sampling_table
,
+ pad_sequences
,
+ text_hashing_trick
,
+ text_one_hot
,
+ text_to_word_sequence
Print a summary of a Keras model
+ + +# S3 method for keras.engine.training.Model +summary(object, + line_length = getOption("width"), positions = NULL, ...)+ +
object | +Keras model instance |
+
---|---|
line_length | +Total length of printed lines |
+
positions | +Relative or absolute positions of log elements in each line.
+If not provided, defaults to |
+
... | +Unused |
+
Other model functions: compile
,
+ evaluate_generator
, evaluate
,
+ fit_generator
, fit
,
+ get_config
, get_layer
,
+ keras_model_sequential
,
+ keras_model
, pop_layer
,
+ predict.keras.engine.training.Model
,
+ predict_generator
,
+ predict_on_batch
,
+ predict_proba
, train_on_batch
Converts a text to a sequence of indexes in a fixed-size hashing space.
+ + +text_hashing_trick(text, n, hash_function = NULL, + filters = "!\"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n", lower = TRUE, + split = " ")+ +
text | +Input text (string). |
+
---|---|
n | +Dimension of the hashing space. |
+
hash_function | +if |
+
filters | +Sequence of characters to filter out. |
+
lower | +Whether to convert the input to lowercase. |
+
split | +Sentence split marker (string). |
+
A list of integer word indices (unicity non-guaranteed).
+ +Two or more words may be assigned to the same index, due to possible +collisions by the hashing function.
+ +Other text preprocessing: make_sampling_table
,
+ pad_sequences
, skipgrams
,
+ text_one_hot
,
+ text_to_word_sequence
One-hot encode a text into a list of word indexes in a vocabulary of size n.
+ + +text_one_hot(text, n, filters = "!\"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n", + lower = TRUE, split = " ")+ +
text | +Input text (string). |
+
---|---|
n | +Size of vocabulary (integer) |
+
filters | +Sequence of characters to filter out. |
+
lower | +Whether to convert the input to lowercase. |
+
split | +Sentence split marker (string). |
+
List of integers in [1, n]
. Each integer encodes a word (unicity
+non-guaranteed).
Other text preprocessing: make_sampling_table
,
+ pad_sequences
, skipgrams
,
+ text_hashing_trick
,
+ text_to_word_sequence
Convert text to a sequence of words (or tokens).
+ + +text_to_word_sequence(text, + filters = "!\"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n", lower = TRUE, + split = " ")+ +
text | +Input text (string). |
+
---|---|
filters | +Sequence of characters to filter out. |
+
lower | +Whether to convert the input to lowercase. |
+
split | +Sentence split marker (string). |
+
Words (or tokens)
+ +Other text preprocessing: make_sampling_table
,
+ pad_sequences
, skipgrams
,
+ text_hashing_trick
,
+ text_one_hot
Vectorize a text corpus, by turning each text into either a sequence of +integers (each integer being the index of a token in a dictionary) or into a +vector where the coefficient for each token could be binary, based on word +count, based on tf-idf...
+ + +text_tokenizer(num_words = NULL, + filters = "!\"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n", lower = TRUE, + split = " ", char_level = FALSE)+ +
num_words | +the maximum number of words to keep, based on word
+frequency. Only the most common |
+
---|---|
filters | +a string where each element is a character that will be +filtered from the texts. The default is all punctuation, plus tabs and line +breaks, minus the ' character. |
+
lower | +boolean. Whether to convert the texts to lowercase. |
+
split | +character or string to use for token splitting. |
+
char_level | +if |
+
By default, all punctuation is removed, turning the texts into
+space-separated sequences of words (words maybe include the ' character).
+These sequences are then split into lists of tokens. They will then be
+indexed or vectorized. 0
is a reserved index that won't be assigned to any
+word.
The tokenizer object has the following attributes:
word_counts
--- named list mapping words to the number of times they appeared
+on during fit. Only set after fit_text_tokenizer()
is called on the tokenizer.
word_docs
--- named list mapping words to the number of documents/texts they
+appeared on during fit. Only set after fit_text_tokenizer()
is called on the tokenizer.
word_index
--- named list mapping words to their rank/index (int). Only set
+after fit_text_tokenizer()
is called on the tokenizer.
document_count
--- int. Number of documents (texts/sequences) the tokenizer
+was trained on. Only set after fit_text_tokenizer()
is called on the tokenizer.
Other text tokenization: fit_text_tokenizer
,
+ sequences_to_matrix
,
+ texts_to_matrix
,
+ texts_to_sequences_generator
,
+ texts_to_sequences
Convert a list of texts to a matrix.
+ + +texts_to_matrix(tokenizer, texts, mode = c("binary", "count", "tfidf", + "freq"))+ +
tokenizer | +Tokenizer |
+
---|---|
texts | +Vector/list of texts (strings). |
+
mode | +one of "binary", "count", "tfidf", "freq". |
+
A matrix
+ +Other text tokenization: fit_text_tokenizer
,
+ sequences_to_matrix
,
+ text_tokenizer
,
+ texts_to_sequences_generator
,
+ texts_to_sequences
Only top "num_words" most frequent words will be taken into account. +Only words known by the tokenizer will be taken into account.
+ + +texts_to_sequences(tokenizer, texts)+ +
tokenizer | +Tokenizer |
+
---|---|
texts | +Vector/list of texts (strings). |
+
Other text tokenization: fit_text_tokenizer
,
+ sequences_to_matrix
,
+ text_tokenizer
,
+ texts_to_matrix
,
+ texts_to_sequences_generator
Only top "num_words" most frequent words will be taken into account. +Only words known by the tokenizer will be taken into account.
+ + +texts_to_sequences_generator(tokenizer, texts)+ +
tokenizer | +Tokenizer |
+
---|---|
texts | +Vector/list of texts (strings). |
+
Generator which yields individual sequences
+ +Other text tokenization: fit_text_tokenizer
,
+ sequences_to_matrix
,
+ text_tokenizer
,
+ texts_to_matrix
,
+ texts_to_sequences
The input should be at least 3D, and the dimension of index one will be +considered to be the temporal dimension.
+ + +time_distributed(object, layer, input_shape = NULL, + batch_input_shape = NULL, batch_size = NULL, dtype = NULL, + name = NULL, trainable = NULL, weights = NULL)+ +
object | +Model or layer object |
+
---|---|
layer | +A layer instance. |
+
input_shape | +Dimensionality of the input (integer) not including the +samples axis. This argument is required when using this layer as the first +layer in a model. |
+
batch_input_shape | +Shapes, including the batch size. For instance,
+ |
+
batch_size | +Fixed batch size for layer |
+
dtype | +The data type expected by the input, as a string ( |
+
name | +An optional name string for the layer. Should be unique in a +model (do not reuse the same name twice). It will be autogenerated if it +isn't provided. |
+
trainable | +Whether the layer weights will be updated during training. |
+
weights | +Initial weights for layer. |
+
Consider a batch of 32 samples, where each sample is a sequence of 10 vectors of 16 dimensions. The batch
+input shape of the layer is then (32, 10, 16)
, and the input_shape
, not
+including the samples dimension, is (10, 16)
. You can then use
+time_distributed
to apply a layer_dense
to each of the 10 timesteps,
+independently.
Other layer wrappers: bidirectional
Converts a class vector (integers) to binary class matrix.
+ + +to_categorical(y, num_classes = NULL)+ +
y | +Class vector to be converted into a matrix (integers from 0 to num_classes). |
+
---|---|
num_classes | +Total number of classes. |
+
A binary matrix representation of the input.
+ +E.g. for use with loss_categorical_crossentropy()
.
Convert an object to a NumPy array which has the optimal in-memory layout and +floating point data type for the current Keras backend.
+ + +to_numpy_array(x, dtype = NULL, order = "C")+ +
x | +Object or list of objects to convert |
+
---|---|
dtype | +NumPy data type (e.g. float32, float64). If this is unspecified +then R doubles will be converted to the default floating point type for the +current Keras backend. |
+
order | +In-memory order ('C' or 'F'). Defaults to 'C', which is the +optimal order in nearly every case for Keras backends. |
+
NumPy array with the specified type and order (or list of NumPy
+arrays if a list was passed for x
).
Single gradient update or model evaluation over one batch of samples.
+ + +train_on_batch(object, x, y, class_weight = NULL, sample_weight = NULL) + +test_on_batch(object, x, y, sample_weight = NULL)+ +
object | +Keras model object |
+
---|---|
x | +input data, as an array or list of arrays (if the model has multiple +inputs). |
+
y | +labels, as an array. |
+
class_weight | +named list mapping classes to a weight value, used for +scaling the loss function (during training only). |
+
sample_weight | +sample weights, as an array. |
+
Scalar training or test loss (if the model has no metrics) or list of scalars
+(if the model computes other metrics). The property model$metrics_names
+will give you the display labels for the scalar outputs.
Other model functions: compile
,
+ evaluate_generator
, evaluate
,
+ fit_generator
, fit
,
+ get_config
, get_layer
,
+ keras_model_sequential
,
+ keras_model
, pop_layer
,
+ predict.keras.engine.training.Model
,
+ predict_generator
,
+ predict_on_batch
,
+ predict_proba
,
+ summary.keras.engine.training.Model