Skip to content

v3.5

Compare
Choose a tag to compare
@vpirogov vpirogov released this 11 Jun 23:26
· 41 commits to rls-v3.5 since this release

Performance Optimizations

Intel Architecture Processors

  • Improved performance for 4th generation Intel Xeon Scalable processors (formerly Sapphire Rapids).
  • Improved performance for the future Intel Xeon Scalable processors (code-named Sierra Forest and Granite Rapids).
  • Improved performance of group normalization primitive.
  • Improved performance of matmul primitive with sum post-op for batched cases on processors with Intel AMX instruction set support.
  • Improved performance of the following subgraphs with Graph API:
    • Multi-Query Attention (MQA).
    • Scaled Dot Product Attention (SDPA), including the variant with select operation.
    • LayerNorm + Multiply + Quantize produced by SmoothQuant algorithm.
    • Convolution + Sigmoid + Multiply with mixed precisions.

Intel Graphics Products

  • Improved performance for Processor Graphics based on Xe2 architecture.
  • Improved performance for the Intel Data Center GPU Max Series (formerly Ponte Vecchio).
  • Improved performance for Intel Arc graphics (formerly Alchemist and DG2) and the Intel Data Center GPU Flex Series (formerly Arctic Sound).
  • Improved RNN primitive performance for LSTM cell case.
  • Improved performance of f8_e4m3 data type emulation on Intel Data Center GPU Max Series (formerly Ponte Vecchio).

AArch64-based Processors

  • Improved convolution forward propagation, matmul, and softmax performance for processors with SVE support.
  • Improved bf16 matmul, convolution, and reorder primitives performance with Arm Compute Library (ACL).
  • Improved eltwise primitive performance with gelu_erf algorithm with ACL.

Functionality

  • Introduced sum and binary post-ops support for layer normalization primitive. This functionality is currently implemented on CPUs only.
  • Introduced support for int4 data type and extended quantization model with support for grouped scales and zero points.
  • Introduced fp64 matmul support. This functionality is currently implemented on Intel GPUs with hardware acceleration for fp64 math only.
  • Extended floating point math mode API to support weight decompression scenarios. See matmul weights decompression example to get started. New floating mode is supported in the following configurations:
    • bfloat16 matmul with int8 weights on Intel CPUs.
    • float16 and bfloat16 matmul with int8 or int4 weights on Intel GPUs.
  • [experimental] Introduced microkernel API for Intel Architecture Processors. This API exposes internal mechanisms used in matmul and convolution implementation to expert users.

Usability

  • Extended error messages for engine and memory objects creation errors.
  • Extended verbose mode diagnostics with information on dispatching decisions for all primitives.
  • Introduced support for clang++ host compiler in SYCL builds.
  • Introduced API for tensor serialization and deserialization.
  • Extended verbose mode diagnostics for Graph API with information on pattern matcher decisions.
  • Introduced OpenCL runtime support for Graph API.
  • Added support for building oneDNN with installed Arm Compute Library (ACL).

Validation

  • Extended benchdnn with support for tensor tags in RNN primitive validation.

Breaking Changes

  • Updated minimal supported ACL version to 24.04 (was 23.11).

Thanks to these Contributors

This release contains contributions from the project core team as well as Abdel @quickwritereader, @AngryLoki, Crefeda Rodrigues @cfRod, Daniel Richard G. @iskunk, David Svantesson @davsva01, @deepeshfujitsu, Dylan Angus @dylan-angus-codeplay, Emanuele Rocca @ema, Fadi Arafeh @fadara01, Hernan Martinez @hmartinez82, John Osorio @kala855, Jonathan Deakin @jondea, @kasturedeeksha, Kentaro Kawakami @kawakami-k, Nikita Shulga @malfet, Radu Salavat @Radu2k, Renato Barros Arantes @renato-arantes, Roman Zhukov @rozhukov, Ryo Suzuki @Ryo-not-rio, @Shreyas-fuj, Sunita Nadampalli @snadampal, Tadej Ciglarič @t4c1, Vineel Abhinav @vineelabhinav, @vishwascm. We would also like to thank everyone who asked questions and reported issues.