v3
- Use FMA (fused multiply-add) instructions in some functions. Speeds up the float paths a bit.
- Add another SIMD function for a little more speed with 16 bit input.
- Automatically select the best functions if opt=True, use only C functions if opt=False.
- Don't embed the weights into the DLLs. nnedi3 weights.bin needs to be in the same folder as the DLL.