How to support multi-devices in our new framework

1. 首先定义问题，这里的multi-devices仅仅指的是在单机上，一个网络的不同层可以被拆分到不同设备上执行

```
net = ['op1', 'op2', 'op3', 'op4']
```
如上述所示，op1在CPU上执行，op2在0号GPU上执行，op3在1号GPU上执行，op4在FPGA上执行

对于多机情况，限定多机是单机的简单复制，仅考虑数据并行，不支持模型跨多机执行。

2. 目前Paddle中有两个概念与设备紧密相关，分别是Place和DeviceContext.

Place定义如下：
```
typedef boost::variant<GPUPlace, CPUPlace, FPGAPlace> Place;
```

DeviceContext定义如下：

```
class DeviceContext {};
class CPUDeviceContext : public DeviceContext {};
class CUDADeviceContext : public DeviceContext {};
class FPGADeviceContext : public DeviceContext {};
```

这里的FPGAPlace和FPGADeviceContext是为了举例说明怎么向Paddle添加一个新设备的支持。


3.  Operator是描述一个操作的，真正的计算发生在OpKernel里面。一个Operator可以对应于多个OpKernel。我们可以针对不同的设备实现不同的OpKernel，注册到一个map中。

而OpKernel的选择，是运行时根据传入DeviceContext所包含的Place信息决定的。

```
virtual void OperatorBase::Run(const Scope& scope, const platform::DeviceContext& dev_ctx) const = 0;
```

4. 考虑怎么根据用户的配置，执行一个multi-devices的网络：
 - 首先VarDesc和OpDesc必须包含device信息，这样才能知道Op执行在什么设备上
 - 需要实现copy operator，用于不同设备之间数据的拷贝。当用户配置跨设备网络时，必须显式添加copy operator（以后可以由Paddle来自动添加，减轻用户负担）
 - 需要实现DeviceContextManager，可以根据网络配置中的device信息，预先创建出对应的DeviceContext。然后在执行每一个Operator时，可以从从DeviceContextManager里面去拿DeviceContext参数。
 - 所以还是需要有一个Executor/Scheduler/etc，这样一个概念，来根据用户配置的device信息，从DeviceContextManager中拿到对应的DeviceContext，然后执行每一个Operator



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to support multi-devices in our new framework #4031

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to support multi-devices in our new framework #4031

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions