Skip to content

Optimizer C_API  #2168

Closed
Closed
@dzhwinter

Description

@dzhwinter

上次讨论提到

Model Optimization Using Gradients

There are two ways to perform model optimization using gradients:

  • On Client
    The client does multiple steps of forward and backward update. In each step, the gradients are >calculated and a new model is generated. After some steps, the client will calculate the difference >between the newest model and the old model at step 0. The difference will be updated to >parameter servers. Parameter servers will just update parameters using the difference without any >optimization using gradients (such as Adam and L1 regularization).
  • On Parameter Server
    The client will send accumulated gradients to parameter servers, the parameter server will do the >optimization using gradients.

这两种更新参数的方法计划都支持。目前v1版本只支持1(On Client的方法),由于两种方法都需要Optimizer更新的策略,因此选择将Optimizer封装成一个库。

ParameterServer为Go语言实现,需要一个Optimizer的C接口,定义如下

    // support data type same with @helin's client design doc, 
    typedef enum {
      PADDLE_ELEMENT_TYPE_INT32   = 0,
      PADDLE_ELEMENT_TYPE_UINT32  = 1,
      PADDLE_ELEMENT_TYPE_INT64   = 2,
      PADDLE_ELEMENT_TYPE_UINT64  = 3,
      PADDLE_ELEMENT_TYPE_FLOAT32 = 4,
      PADDLE_ELEMENT_TYPE_FLOAT64 = 5,
    } paddle_element_type;
    
    
    
    /*
    @brief update interface of optimizer, which will be used in 
    Trainer process
    ParameterServer process to support On ParameterServer optimize
    @param buffer : array of parameters
    @param datatype : datatype of parameter and gradient
    @param optimizer_name: optimizer_name as algorithm id, "SGD, Adam"
    @param gradient : array of gradients, which will be apply to parameters
    */
    void updateParameter(void *buffer, paddle_element_type datatype, const char* optimizer_name, const void* gradient);

1、是否能将sparseUpdate和denseUpdate都使用这一个接口?

SparseUpdate存储为SparseRowMatrix,可以复用这个接口。

2、是否能将Regularizer一起封装在这个库里?

On Client的参数更新方式已经和通信耦合,特别是SparseUpdate时候,由于Update的过程是lazy的,本地迭代了多次,Regularizer需要保存计算的轮数,并且需要在某次读取时候触发更新。没想到合适的办法拆分通信状态

Optimizer计划封装math库里的底层applySGD等操作,详细代码位置见:

https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/math/TrainingAlgorithmOp.cu#L25

后期接入Majel可以迁移这部分代码

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions