Release notes from komputation

v0.12.5:

2018-01-15T03:10:35Z

Switched CUDA C development to CLion
Used the JETBRAINS_IDE macro to declare CUDA's language extensions
Header include paths are now relative to the given source file
For real-time compilation with nvrtc, all include directives in the source code are replaced with a sequence of directives that use paths relative to the CUDA resource base directory.
Header files are now inferred from the source code and do no longer have to be specified in kernel instructions.
Fixed comparisons in the binary testing kernel
Replaced double constants with floats
Removed the unused numberEntries parameter from the kernel that replaces NaNs
Removed unused parameter from functions used for backpropagation kernels of recurrent layers
Resolved a name conflict in the max-pooling kernel
Simplified the definition of the stack of convolutional layers in the embedding toy demo with two filter widths

v0.12.3:

2018-01-09T06:32:52Z

Finished implementing experimental support for (fixed-length, left-to-right, vanilla) GPU-accelerated recurrent neural networks
Fixed the allocation of memory for the propagation result in CudaSquaredLoss
Added a helper function to access and print arrays on the device
Implemented a SumKernel to add up accumulated gradients for parameters that are in used in each instance
Added CUDA helper functions to cooperatively copy an array and add up two arrays
Moved the entrywise CUDA activation functions to header files
Removed unused array fill kernels
Added a pointer to the maximum number of input columns in BaseCudaContinuation
The shared parameter is passed directly to the CPU-specific ParameterizedSeries instruction. This makes it possible to use the same entries for the CPU and CUDA.
Removed the CUDA IDs from the ResultExtraction enumeration
Set the device activity function IDs to be constant
Added a CUDA version of the increment demo
Mentioned the demo in the README
Replaced kotlin-stdlib-jre8 with kotlin-stdlib-jdk8

v0.12.2:

2018-01-07T06:06:22Z

Removed the projection of a zero initial state vector in the first step in CpuRecurrent

v0.12.1:

2018-01-05T20:42:11Z

The summation of gradients based on the parameter index in CudaLookup is now deterministic.
Removed the hash table kernel
Replaced the use of the hash table with a pointer to the parameter indices
Rewrote the group sum kernel based on information about the indices of the first occurrence of a parameter and its remaining occurrences
Added a kernel two add up two arrays
Fixed backward propagation in CudaStack by replacing the cuBLAS axpy operation with the use of the addition kernel
The input memory can now store information about duplicate occurrences.
Improved the name of the setters in InputMemory
The optimizer kernels now check if the count is strictly positive.
Moved reusable batch size and output entries members to BaseCudaEntryPoint
Increased the batch size to 16 and changed hyperparameters in the TREC demos with two filter widths.
Mentioned the CUDA TREC demo with two filters in the README

v0.12.0:

2017-12-24T03:46:12Z

Simplified the specification of networks
The input dimensions over the continuations of the network are computed automatically.
Removed the Layer suffix from instruction factory functions
Overloaded the instruction factory function to simplify the specification of initialization strategies
Renamed Direction.Forward/Backward to Direction.LeftToRight/RightToLeft
Shortened "ActivationFunction" to "Activation" and "ActivationLayer" to "Activation"
Generalized BaseCudaEntrywiseActivationLayer to BaseCudaEntrywiseLayer
The specification of the minimum length is required in the lookup instruction and optional in the input instruction.
TREC categories are indexed based on all available training data.
Renamed "forward" layer to "continuation" and shortened "combination layer" to "combination"
Moved the architecture-specific interfaces from the general package to the respective architecture-specific packages
Improved the names used in SparseAccumulator and SparseUpdate
The series is passed on to the method of the ResultExtractionStrategy interface.
Introduced CpuCombinationSeries to implement the addition of the weighted previous state and the weighted current input.
Added the Cpu prefix to Series and ParameterizedSeries in preparation of the CUDA implementation of recurrent neural networks
Optimized the performance RNN implementation by adding the bias to the input rather than adding at each step
Fixed the specification of the number of rows in CpuLogisticLoss
Renamed the "Negation" demo to "Not"
Stopped experimenting with dynamic parallelism
CudaIdentity now implements CudaActivation.
Introduced a base class for higher-order layers
Differentiated the CUDA continuation base class into one class for layers that change the number of columns and one class for layers that don't.
Reused the code for the computation of launch configurations in CudaHashing and CudaGroupSum
Fixed the sparse updated in CudaLookup
Added a "copy" helper function that encapsulates System.arraycopy for copies
Added a setter to InputMemory that caches all possible data
Clarified references to the hash table in CUDA optimizers
CUDA layers pass a pointer to the length of the input data and the maximum length within the batch.
Unified the activation instruction factory functions over the two architectures
Moved the concatenation layer to a separate package
Added an instruction for weightings with shared parameters that is separate from the instruction for the weighting layer that uses a dedicated parameter
The two weighting instructions inherit from the new BaseWeighting class.
Added instructions for the tree series types: Series, ParameterizedSeries and CombinationSeries
Refactored the CPU RNN factory function based on the instructions
Continuation instructions implement HasOutputDimensions and CanSetInputDimensions, while entry point instructions only implement HasOutputDimensions.
Inlined some CUDA C helper functions
Moved the division by 2 in the squared loss function from the host to the device
Added the missing scaling of gradients in some of the optimization kernels
Refactored the for loops used to update entries in optimization kernels
Temporarily removed the CUDA forward layer tests
Updated the links in the README
Upgraded to Kotlin 1.2.10

v0.11.3:

2017-12-12T00:41:38Z

Added an instruction for bidirectional recurrent layers
Rearranged the parameters in the factory functions of the recurrent layer and the dropout layer instruction
Overloaded the dropout layer instruction factory function for the case of vectorial input
Mentioned the bidirectional recurrent layer and the new running total demos in the README
Updated the TREC sample code in the README

v0.11.2:

2017-12-10T14:13:23Z

The recurrent layer can now emit either all steps or the last step.
Added demos that compute the total of fixed-length and variable-length input
Mentioned the new recurrent layer implementation in the README
Included links to the demos in the README

v0.11.1:

2017-10-31T20:14:54Z

Implemented testing support for multi-class and binary classification problems
Constructors of optimization instructions are now internal.
Removed AttentiveDecoder and the reverse demo based on that decoder
Removed its specific dependencies: column repetition, row summation and transposition

v0.11.0:

2017-10-27T13:24:23Z

Implemented and tested Adam optimization for CUDA
Set a delta in the equality assertions of CUDA optimization tests

v0.10.6:

2017-10-27T09:41:23Z

Fixed compilation errors in the kernels for SGD and Momentum
Implemented and tested Adadelta optimization for CUDA