Comparing changes

* First stab at dropout; conflict with base type TODO * Partial dropout integration * Test uninitialized dropout layer * Test dropout state that follows an input layer * Enable forward pass for dropout; backward pass TODO * Version bump and add dropout to the features table * Add dropout to CMake * Enable preprocessing in fpm.toml (needed with recent versions of fpm) * Small change in scale implementation * Integration of backward pass for dropout * Reduce tolerance in conv2d convergence tests * Fix bug in dropout scaling Co-authored-by: Ricardo Orsi <@ricor07> * disable dropout in inference mode (net % predict); TODO enable in net % train * Set dropout's training mode to true in net % train(); add tests * WIP dropout tests * Dropout layers always in training mode; except when is called, when they are in inference mode * Update the layers table * Ensure the actual dropout rate == requested dropout rate in most cases * Accumulate the gradient in dropout % backward and flush in network % update * Guard against bad dropout rate * Connect the backward pass; expand tests * Expand tests * Use the reference scaling in dropout; don't accumulate gradients because it's not needed * Add dropout to MNIST example; small model changes * Add reference * Update print_info dropout * Update print_info * Compute scale once in dropout constructor * dropout % backward() doesn't need input from the previous layer * Timing info of dropout --------- Co-authored-by: Vandenplas, Jeremie <jeremie.vandenplas@wur.nl>

* linear2d_layer forward implementation * linear2d_layer: temporarily remove api * Don't expose the concrete layer type via nf * Plumbing of linear2d with input2d and linear2d * linear2d_layer: add flatten2d layer * linear2d_layer: make linear2d layer work with input2d and flatten2d * update cmake * linear2d_layer: remove flatten2d layer * linear2d_layer: remove public api * linear2d_layer: update cmakelists * Add linear2d example * linear2d_layer: remove redundant constructor args * linear2d_layer: make example converge * linear2d_layer: add loss stopping and more iterations * start impementing MultiHeadAttention * scaled dot product attention * combine attention heads * forward (not working) * rearrange attention dimensions in more efficient way * initial forward implementation for multi-head attention * tests for multihead_attention%forward * multihead_attention: move most logic to subroutines (performance) * multihead_attention: update tests * multihead_attention: concurrency * multihead_attention: proof of concept backward (works, but not mathematically correct) * multihead_attention: fix minor scaling issue * multihead_attention: complete backward implementation * multihead_attention: add comments for forward prop * multihead_attention: add tests for backward * multihead_attention: adjust expected test values for updated scaling * multihead_attention: calculate scaling factor only once * multihead_attention: use heap-allocated arrays during back prop * multihead_attention: use heap-allocated arrays in forward * multihead_attention: set values from correct shape to tests * multihead_attention: fix issues with shapes (softmax prime became even more monstruos) * multihead_attention: minor refactoring and optimization * multihead_attention: fix comments * multihead_attention: tests, add checks for attention weights * multihead_attention: remove some of the copypaste comments * multihead_attention: optimize shapes * multihead_attention: params api * multihead_attention: fix incorrect dw bug * multihead_attention: tests for updated parameters * multihead_attention: remove reshape crutches * multihead_attention: rename common forward and backward calls * multihead_attention: tidy mha up * multihead_attention: self attention * multihead_attention: add cross attention * multihead_attention: add more comments * multihead_attention: arrange attention into submodule * multihead_attention: update cmakelists * multihead_attention: update attention in accordance with linear2d * multihead_attention: remove redundand constructor args for attention layers * multihead_attention: use pure and elemental where necessary * multihead_attention: plumbing * multihead_attention: add reference * multihead_attention: remove rebase artifact * multihead_attention: remove redundant args * multihead_attention: update tests * multihead_attention: add the most important lines to tests * multihead_attention: simple MHA example * multihead_attention: update cmake * multihead_attention: remove debug line from tests * multihead_attention: set slightly higher margin for fp imprecision (due to IEEE_DENORMAL) * Rename mha_simple example * Update src/nf/nf_multihead_attention.f90 Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com> * Update src/nf/nf_multihead_attention.f90 Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com> * Update src/nf/nf_multihead_attention.f90 Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com> * Update src/nf/nf_multihead_attention.f90 Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com> * Tidy up * Add self_attention to the layers table --------- Co-authored-by: milancurcic <caomaco@gmail.com> Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comparing changes

Open a pull request

Commits on Feb 21, 2025

Commits on Feb 23, 2025

This comparison is taking too long to generate.

Uh oh!