-
Environment variables & Python API
Generally, the default configuration of Intel® Extension for TensorFlow* provides good performance without any code changes. Intel® Extension for TensorFlow* also provides simple frontend Python APIs and utilities for advanced users to get more optimized performance with only minor code changes for different kinds of application scenarios. Typically, you only need to add two or three clauses to the original code.
-
Next Pluggable Device (NPD)
The Next Pluggable Device (NPD) represents an advanced generation of TensorFlow plugin mechanisms. It not only facilitates a seamless integration of new accelerator plugins for registering devices with TensorFlow without requiring modifications to the TensorFlow codebase, but it also serves as a conduit to OpenXLA via its PJRT plugin. This innovative approach significantly streamlines the process of extending TensorFlow's capabilities with new hardware accelerators, enhancing both efficiency and flexibility.
-
Advanced auto mixed precision (AMP)
Low precision data types
bfloat16
andfloat16
are natively supported by the3rd Generation Xeon® Scalable Processors
, codenamed Cooper Lake, withAVX512
instruction set and the Intel® Data Center GPU, which further boosts performance and uses less memory. The lower-precision data types supported by Advanced Auto Mixed Precision (AMP) are fully enabled in Intel® Extension for TensorFlow*. -
Graph optimization
Intel® Extension for TensorFlow* provides graph optimization to fuse specific operator patterns to a new single operator for better performance, such as
Conv2D+ReLU
orLinear+ReLU
. The benefits of the fusions are delivered to users in a transparent fashion. -
CPU Thread Pool
Intel® Extension for TensorFlow* uses OMP thread pool by default since it has better performance and scaling for most cases. For workloads with large inter-op concurrency, you can switch to use Eigen thread pool (default in TensorFlow) by setting the environment variable
ITEX_OMP_THREADPOOL=0
. -
Operator optimization
Intel® Extension for TensorFlow* also optimizes operators and implements several customized operators for a performance boost. The
itex.ops
namespace is used to extend TensorFlow public APIs implementation for better performance. -
GPU profiler
Intel® Extension for TensorFlow* provides support for TensorFlow Profiler. To enable the profiler, define three environment variables (
export ZE_ENABLE_TRACING_LAYER=1
,export UseCyclesPerSecondTimer=1
,export ENABLE_TF_PROFILER=1
) -
INT8 quantization
Intel® Extension for TensorFlow* co-works with Intel® Neural Compressor to provide compatible TensorFlow INT8 quantization solution support with equivalent user experience.
-
XPUAutoShard on GPU [Experimental]
Intel® Extension for TensorFlow* provides XPUAutoShard feature to automatically shard the input data and the TensorFlow graph, placing these data/graph shards on GPU devices to maximize the hardware usage.
-
OpenXLA
Intel® Extension for TensorFlow* adopts a uniform Device API PJRT as the supported device plugin mechanism to implement Intel GPU backend for OpenXLA support on TensorFlow frontend.
-
Keras 3 Keras 3 with TensorFlow comes with a significant enhancement - the Just-In-Time (JIT) compilation is enabled by default. This feature leverages the XLA (Accelerated Linear Algebra) compiler to optimize TensorFlow computations. See Keras 3 to avoid possible performance issues and error.