Skip to content

v24.09

Latest
Compare
Choose a tag to compare
@developer-compute developer-compute released this 27 Sep 13:56

v24.09 Public Major Release

Feat

  • Provide a wrapper class to expose cpu::CpuSoftmaxGeneric

  • Detect number of cores in Windows®

  • Add Optimized SME kernel for QASYMM8_SIGNED elementwise addition operation

Fix

  • LogSoftmax Int8/UInt8 mismatches in Cpu

  • Rounding of negative integers in pooling 2d/3d gpu kernels

  • OpenMP® linker error on Windows®

  • Rounding of negative integers in pooling 2d/3d kernels

  • Patches linker failure for cpu::CpuSoftmaxGeneric in partial builds

  • Cpu/Gpu Reverse data type support

  • QSYMM16 broadcasted subtraction failures

  • CpuMulKernel validation when there is x-broadcasting for some types

  • Data type validation in depthwise op in Cpu

  • Update macOS® build instructions

  • Validation tests compute reference and target on each iteration

  • Reset permuted input and weights on configure in NEDepthwiseConvolutionLayer

  • Selectively enable CL job chaining

Refactor

  • Generate only one shared library when building with CMake

  • Add BF16 LUT for Softmax Layer with tests

  • Move heuristic logic of activation kernel into separate class

  • Removed unused CommandBuffer.

Perf

  • Allocate Persistent and Prepare tensors at start of prepare()

  • Use mws in OMPScheduler for better thread throttling

  • Enable FP16 winograd in CpuConv2d for v8a multi_isa builds.

Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v24.09/index.xhtml