Skip to content

Commit

Permalink
[intel-npu] Adding NPU_TURBO option to plugin (#25646)
Browse files Browse the repository at this point in the history
### Details:
 - Adding npu_turbo option for intel-npu plugin
 - updating documentation with turbo and other missing properties

Master backport of
#25603

### Tickets:
 - [*ticket-id*](https://jira.devtools.intel.com/browse/CVS-147038)
  • Loading branch information
csoka authored Aug 5, 2024
1 parent 64c5f67 commit e35acf9
Show file tree
Hide file tree
Showing 18 changed files with 182 additions and 34 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ offer a limited set of supported OpenVINO features.
ov::enable_profiling
ov::workload_type
ov::intel_npu::compilation_mode_params
ov::intel_npu::turbo
.. tab-item:: Read-only properties

Expand Down
11 changes: 11 additions & 0 deletions src/inference/include/openvino/runtime/intel_npu/properties.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -61,5 +61,16 @@ static constexpr ov::Property<uint32_t, ov::PropertyMutability::RO> driver_versi
*/
static constexpr ov::Property<std::string> compilation_mode_params{"NPU_COMPILATION_MODE_PARAMS"};

/**
* @brief [Only for NPU plugin]
* Type: std::bool
* Set turbo on or off. The turbo mode, where available, provides a hint to the system to maintain the
* maximum NPU frequency and memory throughput within the platform TDP limits.
* Turbo mode is not recommended for sustainable workloads due to higher power consumption and potential impact on other
* compute resources.
* @ingroup ov_runtime_npu_prop_cpp_api
*/
static constexpr ov::Property<bool> turbo{"NPU_TURBO"};

} // namespace intel_npu
} // namespace ov
40 changes: 40 additions & 0 deletions src/plugins/intel_npu/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,14 @@ The following properties are supported:
| `ov::device::architecture`/</br>`DEVICE_ARCHITECTURE` | RO | Returns the platform information. | `N/A`| `N/A` |
| `ov::device::full_name`/</br>`FULL_DEVICE_NAME` | RO | Returns the full name of the NPU device. | `N/A`| `N/A` |
| `ov::internal::exclusive_async_requests`/</br>`EXCLUSIVE_ASYNC_REQUESTS` | RW | Allows to use exclusive task executor for asynchronous infer requests. | `YES`/ `NO`| `NO` |
| `ov::device::type`/</br>`DEVICE_TYPE` | RO | Returns the type of device, discrete or integrated. | `DISCREETE` /</br>`INTEGRATED` | `N/A` |
| `ov::device::gops`/</br>`DEVICE_GOPS` | RO | Returns the Giga OPS per second count (GFLOPS or GIOPS) for a set of precisions supported by specified device. | `N/A`| `N/A` |
| `ov::device::pci_info`/</br>`DEVICE_PCI_INFO` | RO | Returns the PCI bus information of device. See PCIInfo struct definition for details | `N/A`| `N/A` |
| `ov::intel_npu::device_alloc_mem_size`/</br>`NPU_DEVICE_ALLOC_MEM_SIZE` | RO | Size of already allocated NPU DDR memory (both for discrete/integrated NPU devices) | `N/A` | `N/A` |
| `ov::intel_npu::device_total_mem_size`/</br>`NPU_DEVICE_TOTAL_MEM_SIZE` | RO | Size of available NPU DDR memory (both for discrete/integrated NPU devices) | `N/A` | `N/A` |
| `ov::intel_npu::driver_version`/</br>`NPU_DRIVER_VERSION` | RO | NPU driver version (for both discrete/integrated NPU devices). | `N/A` | `N/A` |
| `ov::intel_npu::compilation_mode_params`/</br>`NPU_COMPILATION_MODE_PARAMS` | RW | Set various parameters supported by the NPU compiler. (See bellow) | `<std::string>`| `N/A` |
| `ov::intel_npu::turbo`/</br>`NPU_TURBO` | RW | Set Turbo mode on/off | `YES`/ `NO`| `NO` |

&nbsp;
### Performance Hint: Default Number of DPU Groups / DMA Engines
Expand All @@ -192,6 +200,38 @@ The following table shows the optimal number of inference requests returned by t
| 3720 | 4 | 1 |
| 4000 | 8 | 1 |

&nbsp;
### Compilation mode parameters
``ov::intel_npu::compilation_mode_params`` is an NPU-specific property that allows to control model compilation for NPU.
Note: The functionality is in experimental stage currently, can be a subject for deprecation and may be replaced with generic OV API in future OV releases.

Following configuration options are supported:

#### optimization-level
Defines a preset of optimization passes to be applied during compilation. Supported values:

| Value | Description |
| :--- | :--- |
| 0 | Reduced subset of optimization passes. Smaller compile time. |
| 1 | Default. Balanced performance/compile time. |
| 2 | Prioritize performance over compile time that may be an issue. |

#### performance-hint-override
An extension for LATENCY mode being specified using ``ov::hint::performance_mode``
Has no effect for other ``ov::hint::PerformanceMode`` hints.

Supported values:

| Value | Description |
| :--- | :--- |
| efficiency | Default. Balanced performance and power consumption. |
| latency | Prioritize performance over power efficiency. |

#### Usage example:
```
map<str, str> config = {ov::intel_npu::compilation_mode_params.name(), ov::Any("optimization-level=1 performance-hint-override=latency")};
compile_model(model, config);
```

&nbsp;
## Stateful models
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -204,4 +204,21 @@ struct WORKLOAD_TYPE final : OptionBase<WORKLOAD_TYPE, ov::WorkloadType> {

static std::string toString(const ov::WorkloadType& val);
};

//
// TURBO
//
struct TURBO final : OptionBase<TURBO, bool> {
static std::string_view key() {
return ov::intel_npu::turbo.name();
}

static bool defaultValue() {
return false;
}

static OptionMode mode() {
return OptionMode::RunTime;
}
};
} // namespace intel_npu
2 changes: 1 addition & 1 deletion src/plugins/intel_npu/src/al/include/npu.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ class IEngineBackend : public std::enable_shared_from_this<IEngineBackend> {
/** @brief Backend has support for concurrency batching */
virtual bool isBatchingSupported() const = 0;
/** @brief Backend has support for workload type */
virtual bool isWorkloadTypeSupported() const = 0;
virtual bool isCommandQueueExtSupported() const = 0;
/** @brief Register backend-specific options */
virtual void registerOptions(OptionsDesc& options) const;
/** @brief Get Level Zero context*/
Expand Down
1 change: 1 addition & 0 deletions src/plugins/intel_npu/src/al/src/config/runtime.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ void intel_npu::registerRunTimeOptions(OptionsDesc& desc) {
desc.add<NUM_STREAMS>();
desc.add<ENABLE_CPU_PINNING>();
desc.add<WORKLOAD_TYPE>();
desc.add<TURBO>();
}

// Heuristically obtained number. Varies depending on the values of PLATFORM and PERFORMANCE_HINT
Expand Down
2 changes: 1 addition & 1 deletion src/plugins/intel_npu/src/backend/include/zero_backend.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ class ZeroEngineBackend final : public IEngineBackend {
uint32_t getDriverExtVersion() const override;

bool isBatchingSupported() const override;
bool isWorkloadTypeSupported() const override;
bool isCommandQueueExtSupported() const override;

void* getContext() const override;

Expand Down
2 changes: 1 addition & 1 deletion src/plugins/intel_npu/src/backend/src/zero_backend.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ bool ZeroEngineBackend::isBatchingSupported() const {
return _instance->getDriverExtVersion() >= ZE_GRAPH_EXT_VERSION_1_6;
}

bool ZeroEngineBackend::isWorkloadTypeSupported() const {
bool ZeroEngineBackend::isCommandQueueExtSupported() const {
return _instance->getCommandQueueDdiTable() != nullptr;
}

Expand Down
9 changes: 9 additions & 0 deletions src/plugins/intel_npu/src/backend/src/zero_wrappers.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,15 @@ CommandQueue::CommandQueue(const ze_device_handle_t& device_handle,
_log("CommandQueue", config.get<LOG_LEVEL>()) {
ze_command_queue_desc_t queue_desc =
{ZE_STRUCTURE_TYPE_COMMAND_QUEUE_DESC, nullptr, group_ordinal, 0, 0, ZE_COMMAND_QUEUE_MODE_DEFAULT, priority};
if (config.has<TURBO>()) {
if (_command_queue_npu_dditable_ext != nullptr) {
bool turbo = config.get<TURBO>();
ze_command_queue_desc_npu_ext_t turbo_cfg = {ZE_STRUCTURE_TYPE_COMMAND_QUEUE_DESC_NPU_EXT, nullptr, turbo};
queue_desc.pNext = &turbo_cfg;
} else {
OPENVINO_THROW("Turbo is not supported by the current driver");
}
}
zeroUtils::throwOnFail("zeCommandQueueCreate",
zeCommandQueueCreate(_context, device_handle, &queue_desc, &_handle));
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -512,6 +512,10 @@ std::string LevelZeroCompilerInDriver<TableExtension>::serializeConfig(
std::ostringstream workloadtypestr;
workloadtypestr << ov::workload_type.name() << KEY_VALUE_SEPARATOR << VALUE_DELIMITER << "\\S+" << VALUE_DELIMITER;
content = std::regex_replace(content, std::regex(workloadtypestr.str()), "");
// Remove turbo property as it is not used by compiler
std::ostringstream turbostring;
turbostring << ov::intel_npu::turbo.name() << KEY_VALUE_SEPARATOR << VALUE_DELIMITER << "\\S+" << VALUE_DELIMITER;
content = std::regex_replace(content, std::regex(turbostring.str()), "");

// FINAL step to convert prefixes of remaining params, to ensure backwards compatibility
// From 5.0.0, driver compiler start to use NPU_ prefix, the old version uses VPU_ prefix
Expand Down
2 changes: 1 addition & 1 deletion src/plugins/intel_npu/src/plugin/include/backends.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ class NPUBackends final {
uint32_t getDriverVersion() const;
uint32_t getDriverExtVersion() const;
bool isBatchingSupported() const;
bool isWorkloadTypeSupported() const;
bool isCommandQueueExtSupported() const;
void registerOptions(OptionsDesc& options) const;
void* getContext() const;
std::string getCompilationPlatform(const std::string_view platform, const std::string& deviceId) const;
Expand Down
4 changes: 2 additions & 2 deletions src/plugins/intel_npu/src/plugin/src/backends.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -163,9 +163,9 @@ bool NPUBackends::isBatchingSupported() const {
return false;
}

bool NPUBackends::isWorkloadTypeSupported() const {
bool NPUBackends::isCommandQueueExtSupported() const {
if (_backend != nullptr) {
return _backend->isWorkloadTypeSupported();
return _backend->isCommandQueueExtSupported();
}

return false;
Expand Down
6 changes: 6 additions & 0 deletions src/plugins/intel_npu/src/plugin/src/compiled_model.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -328,6 +328,12 @@ void CompiledModel::initialize_properties() {
[](const Config& config) {
return config.get<COMPILATION_MODE_PARAMS>();
}}},
{ov::intel_npu::turbo.name(),
{isPropertySupported(ov::intel_npu::turbo.name()),
ov::PropertyMutability::RO,
[](const Config& config) {
return config.get<TURBO>();
}}},
// NPU Private
// =========
{ov::intel_npu::tiles.name(),
Expand Down
8 changes: 7 additions & 1 deletion src/plugins/intel_npu/src/plugin/src/plugin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -299,7 +299,7 @@ Plugin::Plugin()
return _metrics->GetAvailableDevicesNames();
}}},
{ov::workload_type.name(),
{_backends->isWorkloadTypeSupported(),
{_backends->isCommandQueueExtSupported(),
ov::PropertyMutability::RW,
[](const Config& config) {
return config.get<WORKLOAD_TYPE>();
Expand Down Expand Up @@ -440,6 +440,12 @@ Plugin::Plugin()
[](const Config& config) {
return config.get<COMPILATION_MODE_PARAMS>();
}}},
{ov::intel_npu::turbo.name(),
{_backends->isCommandQueueExtSupported(),
ov::PropertyMutability::RW,
[](const Config& config) {
return config.get<TURBO>();
}}},
// NPU Private
// =========
{ov::intel_npu::dma_engines.name(),
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
// Copyright (C) 2018-2024 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//

#include "overload/compile_and_infer.hpp"

#include <npu_private_properties.hpp>

#include "common/npu_test_env_cfg.hpp"
#include "common/utils.hpp"

namespace {

using namespace ov::test::behavior;

const std::vector<ov::AnyMap> configs = {{}};

INSTANTIATE_TEST_SUITE_P(compatibility_smoke_BehaviorTests,
OVCompileAndInferRequest,
::testing::Combine(::testing::Values(getConstantGraph(ov::element::f32)),
::testing::Values(ov::test::utils::DEVICE_NPU),
::testing::ValuesIn(configs)),
ov::test::utils::appendPlatformTypeTestName<OVCompileAndInferRequest>);

INSTANTIATE_TEST_SUITE_P(compatibility_smoke_BehaviorTests,
OVCompileAndInferRequestTurbo,
::testing::Combine(::testing::Values(getConstantGraph(ov::element::f32)),
::testing::Values(ov::test::utils::DEVICE_NPU),
::testing::ValuesIn(std::vector<ov::AnyMap>{
{ov::intel_npu::create_executor(0)},
{ov::intel_npu::create_executor(1)}})),
ov::test::utils::appendPlatformTypeTestName<OVCompileAndInferRequestTurbo>);

} // namespace
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ inline std::shared_ptr<ov::Model> getConstantGraph(element::Type type) {
return std::make_shared<Model>(results, params);
}

inline bool isWorkloadTypeSupported() {
inline bool isCommandQueueExtSupported() {
return std::make_shared<::intel_npu::ZeroInitStructsHolder>()->getCommandQueueDdiTable() != nullptr;
}

Expand Down Expand Up @@ -100,7 +100,7 @@ TEST_P(OVCompileAndInferRequest, PluginWorkloadType) {
return property == workload_type.name();
});

if (isWorkloadTypeSupported()) {
if (isCommandQueueExtSupported()) {
ASSERT_TRUE(workloadTypeSupported);
ov::InferRequest req;
OV_ASSERT_NO_THROW(execNet = core->compile_model(function, target_device, configuration));
Expand Down Expand Up @@ -137,7 +137,7 @@ TEST_P(OVCompileAndInferRequest, CompiledModelWorkloadType) {
return property == workload_type.name();
});

if (isWorkloadTypeSupported()) {
if (isCommandQueueExtSupported()) {
ASSERT_TRUE(workloadTypeSupported);
OV_ASSERT_NO_THROW(execNet.set_property(modelConfiguration));
ov::InferRequest req;
Expand Down Expand Up @@ -165,7 +165,7 @@ TEST_P(OVCompileAndInferRequest, CompiledModelWorkloadTypeDelayedExecutor) {
modelConfiguration[workload_type.name()] = WorkloadType::DEFAULT;
OV_ASSERT_NO_THROW(execNet.set_property(modelConfiguration));

if (isWorkloadTypeSupported()) {
if (isCommandQueueExtSupported()) {
ov::InferRequest req;
OV_ASSERT_NO_THROW(req = execNet.create_infer_request());
bool is_called = false;
Expand All @@ -183,6 +183,47 @@ TEST_P(OVCompileAndInferRequest, CompiledModelWorkloadTypeDelayedExecutor) {
}
}

using OVCompileAndInferRequestTurbo = OVCompileAndInferRequest;

TEST_P(OVCompileAndInferRequestTurbo, CompiledModelTurbo) {
configuration[intel_npu::turbo.name()] = true;

auto supportedProperties = core->get_property("NPU", supported_properties.name()).as<std::vector<PropertyName>>();
bool isTurboSupported =
std::any_of(supportedProperties.begin(), supportedProperties.end(), [](const PropertyName& property) {
return property == intel_npu::turbo.name();
});

if (isCommandQueueExtSupported()) {
ASSERT_TRUE(isTurboSupported);
OV_ASSERT_NO_THROW(execNet = core->compile_model(function, target_device, configuration));
auto turbosetting_compiled_model = execNet.get_property(intel_npu::turbo.name());
OV_ASSERT_NO_THROW(turbosetting_compiled_model = true);
ov::InferRequest req;
OV_ASSERT_NO_THROW(req = execNet.create_infer_request());
bool is_called = false;
OV_ASSERT_NO_THROW(req.set_callback([&](std::exception_ptr exception_ptr) {
ASSERT_EQ(exception_ptr, nullptr);
is_called = true;
}));
OV_ASSERT_NO_THROW(req.start_async());
OV_ASSERT_NO_THROW(req.wait());
ASSERT_TRUE(is_called);
} else {
auto cr_ex = configuration.find(intel_npu::create_executor.name());
if (cr_ex->second.as<int64_t>() == 1) {
OV_EXPECT_THROW_HAS_SUBSTRING(core->compile_model(function, target_device, configuration),
ov::Exception,
"Turbo is not supported by the current driver");
} else {
OV_ASSERT_NO_THROW(execNet = core->compile_model(function, target_device, configuration));
OV_EXPECT_THROW_HAS_SUBSTRING(execNet.create_infer_request(),
ov::Exception,
"Turbo is not supported by the current driver");
}
}
}

} // namespace behavior
} // namespace test
} // namespace ov

This file was deleted.

2 changes: 1 addition & 1 deletion src/plugins/intel_npu/thirdparty/level-zero-ext

0 comments on commit e35acf9

Please sign in to comment.