Skip to content
This repository was archived by the owner on Jul 10, 2025. It is now read-only.

RFC: Adding Pluggable Device For TensorFlow #262

Merged
merged 29 commits into from
Sep 29, 2020
Merged
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
3cdeca6
Adding Pluggable Device For TensorFlow RFC
jzhoulon Jun 24, 2020
97d805f
Update RFC PR Number
jzhoulon Jun 24, 2020
72c4589
Update 20200624-pluggable-device-for-tensorflow.md
jzhoulon Jul 2, 2020
8fec775
update StreamExecutor C API
jzhoulon Jul 8, 2020
a60fb80
replace SE_RegisterPlatform->SE_InitializePlugin according to StreamE…
jzhoulon Jul 11, 2020
3317b4b
add user example
jzhoulon Jul 11, 2020
a4b9120
update time
jzhoulon Jul 14, 2020
ec2ff51
device_type attribute of PluggableDevice
jzhoulon Jul 14, 2020
f42a102
update PluggableBFCAllocator description
jzhoulon Jul 22, 2020
2bf7974
adding supported/unsupported scenero for pluggable device and add sub…
jzhoulon Jul 31, 2020
f1b0375
fix subdevice description
jzhoulon Aug 3, 2020
25ee309
fix title
jzhoulon Aug 3, 2020
f81cd14
fix title format
jzhoulon Aug 3, 2020
28209ff
fix format
jzhoulon Aug 4, 2020
beb2fc5
update with new StreamExecutor C API(SE_->SP_)
jzhoulon Aug 6, 2020
81481f8
fix typo
jzhoulon Aug 10, 2020
6c0fc36
update date
jzhoulon Aug 11, 2020
8257245
add front-end mirroring mechanism
jzhoulon Aug 12, 2020
43be4ab
update date
jzhoulon Aug 12, 2020
27d5b27
update scenario1 desc
jzhoulon Aug 12, 2020
18258c0
update scenario 3 desc
jzhoulon Aug 12, 2020
eb83a47
update front-end mirroring mechanisim
jzhoulon Aug 13, 2020
c4542bf
update date
jzhoulon Aug 13, 2020
5030d99
fix desc
jzhoulon Aug 13, 2020
a914a43
fix conflict
jzhoulon Aug 13, 2020
e9ed210
update front-end mirroring description
jzhoulon Aug 13, 2020
0b3feb3
front-end mirroring -> device mapping
jzhoulon Aug 13, 2020
fd35ab1
modify front-end usage for pluggable device according to the review m…
jzhoulon Aug 14, 2020
492ca0b
update MemAllocator
jzhoulon Sep 3, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix subdevice description
  • Loading branch information
jzhoulon committed Aug 3, 2020
commit f1b0375a28f95413978b116cbdc4ade54df7cdcc
30 changes: 13 additions & 17 deletions rfcs/20200624-pluggable-device-for-tensorflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,18 +79,16 @@ This topic describes the user scenarios that are supported/unsupported in Plugga

Upon initialization of TensorFlow, it uses platform independent `LoadLibrary()` to load the dynamic library. The plugin library should be installed to default plugin directory "…python_dir.../site-packages/tensorflow-plugins". The modular tensorflow [RFC](https://github.com/tensorflow/community/pull/77) describes the process of loading plugins.

During the plugin library initialization, TensorFlow proper calls the `SE_InitializePlugin` API (part of StreamExecutor C API) to retrieve nescessary informations from the Plugin to instantiate a StreamExecutor Platform([se::platform](https://github.com/tensorflow/tensorflow/blob/cb32cf0f0160d1f582787119d0480de3ba8b9b53/tensorflow/stream_executor/platform.h#L93) class) and registers to a global object [se::MultiPlatformManager](https://github.com/tensorflow/tensorflow/blob/cb32cf0f0160d1f582787119d0480de3ba8b9b53/tensorflow/stream_executor/multi_platform_manager.h#L82), TensorFlow proper gets a device type and a subdevice type through `SE_InitializePlugin` and registers the `PluggableDeviceFactory`with the device type. The device type will be the device string to be used to access PluggableDevice with tf.device() in python layer. The subdevice type is for low-level specialization of GPU device. If the user cares whether he is running on Intel/NVIDIA GPU, he can call python API (such as `tf.config.list_physical_devices`) to get the subdevice type.
Plugin authors needs to implement `SE_InitializePlugin` and provide the necessary informations:
During the plugin library initialization, TensorFlow proper calls the `SE_InitializePlugin` API (part of StreamExecutor C API) to retrieve nescessary informations from the Plugin to instantiate a StreamExecutor Platform([se::platform](https://github.com/tensorflow/tensorflow/blob/cb32cf0f0160d1f582787119d0480de3ba8b9b53/tensorflow/stream_executor/platform.h#L93) class) and registers to a global object [se::MultiPlatformManager](https://github.com/tensorflow/tensorflow/blob/cb32cf0f0160d1f582787119d0480de3ba8b9b53/tensorflow/stream_executor/multi_platform_manager.h#L82), TensorFlow proper gets a device type and a subdevice type from plugin through `SE_InitializePlugin` and registers the `PluggableDeviceFactory`with the device type. The device type string will be used to access PluggableDevice with tf.device() in python layer. The subdevice type string is used for low-level specialization of GPU device(kernel, StreamExecutor, common runtime, grapper, placer..). If the user cares whether he is running on Intel/NVIDIA GPU, he can call python API (such as `tf.config.list_physical_devices`) to get the subdevice type of GPU to identify. Plugin authors needs to implement `SE_InitializePlugin` and provide the necessary informations:
```cpp
void SE_InitializePlugin(SE_PlatformRegistrationParams* params, TF_Status* status) {
static const int32_t plugin_id_value = 123;
SE_PlatformId id{ SE_PLATFORMID_STRUCT_SIZE };
id.id = &plugin_id_value;
int32_t visible_device_count = get_plugin_device_count();

std::string name = "MyDevicePlatform";
std::string type = "GPU";
std::string sub_type = "MY_GPU"
std::string name = "My_GPU"; //StreamExecutor platform name && subdevice type
std::string type = "GPU"; // device type

params.params.id = id;
params.params.visible_device_count = visible_device_count;
Expand All @@ -102,14 +100,12 @@ void SE_InitializePlugin(SE_PlatformRegistrationParams* params, TF_Status* statu
params.params.name_len = name.size();
params.params.type = type.c_str();
params.params.type_len = type.size();
params.params.sub_type = sub_type.c_str();
params.params.sub_type_len = sub_type.size();
}
```
`ListPhysicalDevice` will encode the subdevice type to the device name
`ListPhysicalDevice` encodes the subdevice type string to the device type string.
```cpp
Status PluggableDeviceFactory::ListPhysicalDevices(std::vector<string>* devices) {
se::Platform* platform = se::MultiPlatformManager::PlatformWithName(platform_name_);
se::Platform* platform = se::MultiPlatformManager::PlatformWithName(sub_device_type_);
for(int i = 0; i < platform->VisibleDeviceCount(); i++) {
const string device_name = strcat("/physical_device:", device_type_, "/", sub_device_type_, ":", i);
devices->push_back(device_name);
Expand All @@ -120,13 +116,13 @@ Status PluggableDeviceFactory::ListPhysicalDevices(std::vector<string>* devices)

### Device Creation

`PluggableDeviceFactory` is introduced to create the `PluggableDevice`, following the [LocalDevice](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/local_device.h) design pattern. To support existing GPU programs running on a new device without user changing the code, plugin authors can register "GPU" string as the device type through `SE_InitializePlugin` and then TensorFlow proper will register the `PluggableDevice` as "GPU" type with higher priority than the default GPU device.
`PluggableDeviceFactory` is introduced to create the `PluggableDevice`, following the [LocalDevice](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/local_device.h) design pattern. To support existing GPU programs running on a new device without user changing the code, plugin authors can register the "GPU" device type through `SE_InitializePlugin` and then TensorFlow proper will register the `PluggableDeviceFactory` for "GPU" type with higher priority than the default GPU device.
Plugin:
```
void SE_InitializePlugin(SE_PlatformRegistrationParams* params, TF_Status* status) {
...
std::string type = "GPU"
params.params.type = type.c_str()
std::string type = "GPU";
params.params.type = type.c_str();
...
}
```
Expand All @@ -144,7 +140,7 @@ For those vendors who don't want to use "GPU" type, it's optional to register a
...
```

When a session is created, `PluggableDeviceFactory` creates a `PluggableDevice` object for the plugin device. During the initialization of the `PluggableDevice`, a global object `se::MultiPlatformManager` will find the `se::platform` through its platform name registered from plugin: "MyDevicePlatform”, then stream executor platform (`se::platform`) further creates or find a `StreamExecutor` object containing a `PluggableDeviceExecutor`, and multiple stream objects(a computation stream and several memory copy streams) supporting the `StreamExecutor` objects.
When a session is created, `PluggableDeviceFactory` creates a `PluggableDevice` object for the plugin device. During the initialization of the `PluggableDevice`, a global object `se::MultiPlatformManager` will find the `se::platform` through its platform name / subdevice type registered from plugin: "My_GPU”, then stream executor platform (`se::platform`) further creates or find a `StreamExecutor` object containing a `PluggableDeviceExecutor`, and multiple stream objects(a computation stream and several memory copy streams) supporting the `StreamExecutor` objects.

The section below shows some pseudo code to introduce some extension inside the TensorFlow proper for the pluggable device creation. The implementation is based on StreamExecutor C API [RFC](https://github.com/tensorflow/community/pull/257).

Expand Down Expand Up @@ -236,19 +232,19 @@ Plugin authors need to provide those C functions implementation defined in Strea

This RFC shows an example of kernel registration for PluggableDevice. Kernel and op registration and implementation API is addressed in a separate [RFC](https://github.com/tensorflow/community/blob/master/rfcs/20190814-kernel-and-op-registration.md).

To avoid kernel registration conflict with existing GPU(CUDA) kernels, plugin author needs to provide a device type(such as "GPU") as well as a subdevice type(such as "INTEL_GPU") to TensorFlow proper for kernel registration and dispatch. The device type indicates the device the kernel runs on, the subdevice type is for low-level specialization of the GPU device.
To avoid kernel registration conflict with existing GPU(CUDA) kernels, plugin author needs to provide a device type(such as "GPU") as well as a subdevice type(such as "INTEL_GPU") to TensorFlow proper for kernel registration and dispatch. The device type indicates the device the kernel runs on, the subdevice type is for low-level specialization of the device.
```cpp
void SE_InitializePlugin(SE_PlatformRegistrationParams* params, TF_Status* status) {
...
std::string type = "GPU" // front-end visible device type
params.params.type = type.c_str();
std::string sub_device_type = "INTEL_GPU"; // low-level specialization device type
params.params.type = backend_device_type.c_str();
std::string name = "INTEL_GPU"; // low-level specialization device type
params.params.type = name.c_str();
...
}

void InitKernelPlugin() {
TF_KernelBuilder* builder = TF_NewKernelBuilder(/*op_name*/"Convolution", "GPU", "INTEL_GPU", //"GPU" is device type
TF_KernelBuilder* builder = TF_NewKernelBuilder(/*op_name*/"Convolution", "GPU", //"GPU" is device type
"INTEL_GPU", &Conv_Create, &Conv_Compute, &Conv_Delete); // "INTEL_GPU" is sub device type
TF_Status* status = TF_NewStatus();
TF_RegisterKernelBuilder(/*kernel_name*/"Convolution", builder, status);
Expand Down