Proposal: Value Type Kernels

### Prerequisites
- [x] I have written a descriptive issue title
- [x] I have verified that I am running the latest version of ImageSharp
- [x] I have searched [open](https://github.com/JimBobSquarePants/ImageSharp/issuess) and [closed](https://github.com/JimBobSquarePants/ImageSharp/issues?q=is%3Aissue+is%3Aclosed) issues to ensure it has not already been reported

### The problem
- There are many places in the library where we operate on relatively small matrices known at compile time using a dynamically allocated heap array (usually wrapped in `Fast2DArray`):
  - `ConvolutionProcessor`-s 
  - `PatternBrush`-es
  - `ResizeProcessor` weight windows
- This approach is easy to code, but has implicit overheads with a significant negative effect on performance:
  - Indexing heap arrays increases the probabilty of cache misses
  - The JIT is not able to apply optimized register allocation with dynamic arrays. The data is always accessed by "read from memory" (`mov` ?) CPU instructions.
  - Calculations are iterating using nested for loops instead of having a simple expression
  - All in all: the generated machine code is more complex, longer, slower to execute than it should be

### The solution
- Define `KernelMatrix{N}x{M}` structures for common kernel sizes (Like `KernelMatrix9x3`, `KernelMatrix4x1` etc.)
- Use specific kernel matrix structures (known at compile time!) as a generic arguments in implementation related classes (`ConvolutionProcessor<TColor, TKernelMatrix>` !)
  - Using `KernelMatrix9x1`-like constructs as resampler windows could bring significant performance improvements for ResizeProcessor (see #139)
- Implement all kernel-related calculations as simple expressions without for loops, etc (Like implementations of `Matrix4x4` methods)
- Use a general matrix for unknown sizes (something similar to`Fast2DArray<float>`)

### TODO
- [ ] Implement a prototype/emulation benchmark comparing 2 Convolution implementations: heap matrix VS value matrix
- [ ] Design and implement stuff based on conclusions

#### Remarks
The concept is very similar to the idea behind [`Block8x8F`](https://github.com/JimBobSquarePants/ImageSharp/blob/master/src/ImageSharp/Formats/Jpeg/Components/Block8x8F.cs) in Jpeg. Utilizing information that is known at compile time is a [very efficient technique](https://twitter.com/CodeWisdom/status/841720334130765824) to improve performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: Value Type Kernels #142

Prerequisites

The problem

The solution

TODO

Remarks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Proposal: Value Type Kernels #142

Description

Prerequisites

The problem

The solution

TODO

Remarks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions