Skip to content

Commit

Permalink
Merge pull request #2136 from precedenceguo/master
Browse files Browse the repository at this point in the history
add documentation for new operator interface in operator_util.h
  • Loading branch information
tqchen committed May 15, 2016
2 parents 99f8eb6 + 403bd16 commit 7841253
Show file tree
Hide file tree
Showing 4 changed files with 323 additions and 2 deletions.
2 changes: 1 addition & 1 deletion docs/how_to/new_op.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,4 +83,4 @@ mlp = mx.symbol.Custom(data=fc3, name='softmax', op_type='softmax')
The complete code for this example can be found at `examples/numpy-ops/custom_softmax.py`

## C++/MShadow(CUDA)
Please refer to [Developer Guide - Operators](https://mxnet.readthedocs.org/en/latest/system/operator.html) for detail.
Please refer to [Developer Guide - SimpleOp](../system/operator_util.md) and [Developer Guide - Operators](https://mxnet.readthedocs.org/en/latest/system/operator.html) for detail.
3 changes: 3 additions & 0 deletions docs/system/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ other. The modules are
graph execution and optimization.
- [Operator](operator.md): Operators that defines static forward and gradient
calculation(backprop).
- [SimpleOp](operator_util.md): Operators that extend to NDArray operators and symbolic operators
in a unified fashion.
- Symbol Construction: Symbolic construction, provide a way to construct
computation graph(net configuration)
- [KVStore](multi_node.md): Key-value store interface for easy parameter synchronizations.
Expand Down Expand Up @@ -75,6 +77,7 @@ Documents of Each Module
------------------------
* [Runtime Dependency Engine](engine.md)
* [Operators](operator.md)
* [SimpleOp](operator_util.md)
-

List of Other Resources
Expand Down
318 changes: 318 additions & 0 deletions docs/system/operator_util.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,318 @@
# Unifying NDArray Operator and Symbolic Operator : How does it work
NDArray operations are similar to symbolic operations except the fact that sometimes we
cannot write in place to the operands without a complete dependency graph. However, the
logics underlying NDArray and Symbolic operation are almost the same. Unifying different
invoking process and returning to the fundamental elements of operators are the purpose of
**SimpleOp**, a new unified operator API. Because most mathematical operators attend to one or two
operands and more operands make dependency-related optimization useful, the unified operator
are specially designed for unary and binary operations.

Consider elements of an operation. Ideally, functions and derivatives are all we need to
describe an operation. Let us restrict that to the space of unary and binary operations. How
do we classify all operations to maximize the possibility of inplace write optimization? Note
that functions can be separate out by the number of operands. Derivatives are a bit more
complex. Whether output value, input data or neither are needed alongside head gradient is
crucial to construct a dependency graph. Gradient functions in the unified API is thus
differentiated through the types of operands it takes for calculation.

Before we continue on the SimpleOp interface, it is recommend to take a look at the [mshadow
library guide](https://github.com/dmlc/mshadow/tree/master/guide) since actual calculations
will be done in `mshadow::TBlob` structure.

In this example, we will create a operator functioning as smooth l1 loss, which is a mixture
of l1 loss and l2 loss. The loss itself can be written as:
```
loss = outside_weight .* f(inside_weight .* (data - label))
grad = outside_weight .* inside_weight .* f'(inside_weight .* (data - label))
```
where `.*` stands for elementwise multiplication and `f`, `f'` is the smooth l1 loss function,
which we suppose we have in `mshadow` for now. At first glance, it is impossible to implement
this particular loss as an unary or binary operator. But we have automatic differentiation in
the symbolic execution. That would simplify the loss to `f` and `f'` directly. In this way, this
loss is no more complex than a `sin` or a `abs` function and can certainly be implemented as a
unary operator.

## SimpleOp: the Unified Operator API
### Define Shapes
`mshadow` library require explicit memory allocation. As a consequence, all data shape
must be provided before any calculation. Before we proceed to define functions and gradient,
we would like to check input data shape consistency and provide output shape.
```cpp
typedef TShape (*UnaryShapeFunction)(const TShape& src,
const EnvArguments& env);
typedef TShape (*BinaryShapeFunction)(const TShape& lhs,
const TShape& rhs,
const EnvArguments& env);
```
We can use `mshadow::TShape` to check input data shape and designate the output data shape.
When this function is not defined, the default output shape will be the same as input shape.
In the case of binary operator, the shape of `lhs` and `rhs` is checked to be the same by default.

Shape functions can also be used to check if any additional arguments and resources are present.
Please refer to additional usages on `EnvArguments` to achieve this aim.

Before we start on our smooth l1 loss example, we define a `XPU` to `cpu` or `gpu` in the header
`smooth_l1_unary-inl.h` implementation so that we reuse the same code in `smooth_l1_unary.cc` and
`smooth_l1_unary.cu`.
```cpp
#include <mxnet/operator_util.h>
#if defined(__CUDACC__)
#define XPU gpu
#else
#define XPU cpu
#endif
```

In our smooth l1 loss example, it is okay for the default behavior of same output shape as source.
Written explicitly, it is
```cpp
inline TShape SmoothL1Shape_(const TShape& src,
const EnvArguments& env) {
return TShape(src);
```

### Define Functions
Create an unary or binary function with one output `mshadow::TBlob`.
```cpp
typedef void (*UnaryFunction)(const TBlob& src,
const EnvArguments& env,
TBlob* ret,
OpReqType req,
RunContext ctx);
typedef void (*BinaryFunction)(const TBlob& lhs,
const TBlob& rhs,
const EnvArguments& env,
TBlob* ret,
OpReqType req,
RunContext ctx);
```
* Functions are differentiated by the types of input arguments.
* `RunContext ctx` contains information needed in runtime for actual execution.

```cpp
struct RunContext {
void *stream; // the stream of the device, can be NULL or Stream<gpu>* in GPU mode
template<typename xpu> inline mshadow::Stream<xpu>* get_stream() // get mshadow stream from Context
} // namespace mxnet
```
`mshadow::stream<xpu> *s = ctx.get_stream<xpu>();` is an example of obtaining a stream from `ctx`.
* `OpReqType req` denotes how computation results are written into `ret`.

```cpp
enum OpReqType {
kNullOp, // no operation, do not write anything
kWriteTo, // write gradient to provided space
kWriteInplace, // perform an inplace write
kAddTo // add to the provided space
};
```
There is a macro defined in `operator_util.h` for a simplified use of `OpReqType`.
`ASSIGN_DISPATCH(out, req, exp)` will check `req` and perform an assignment.

In our smooth l1 loss example, we use `UnaryFunction` to define the function of this operator.
```cpp
template<typename xpu>
void SmoothL1Forward_(const TBlob& src,
const EnvArguments& env,
TBlob *ret,
OpReqType req,
RunContext ctx) {
using namespace mshadow;
using namespace mshadow::expr;
mshadow::Stream<xpu> *s = ctx.get_stream<xpu>();
real_t sigma2 = env.scalar * env.scalar;
MSHADOW_TYPE_SWITCH(ret->type_flag_, DType, {
mshadow::Tensor<xpu, 2, DType> out = ret->get<xpu, 2, DType>(s);
mshadow::Tensor<xpu, 2, DType> in = src.get<xpu, 2, DType>(s);
ASSIGN_DISPATCH(out, req,
F<mshadow_op::smooth_l1_loss>(in, ScalarExp<DType>(sigma2)));
});
}
```
After obtaining `mshadow::Stream` from `RunContext`, we get `mshadow::Tensor` from `mshadow::TBlob`.
`mshadow::F` is a shortcut to initiate a `mshadow` expression. The macro `MSHADOW_TYPE_SWITCH(type, DType, ...)`
handles details on different types and the macro `ASSIGN_DISPATCH(out, req, exp)` checks `OpReqType` and
performs actions accordingly. `sigma2` is a special parameter in this loss, which we will cover in addtional usages.

### Define Gradients (optional)
Create a gradient function with various types of inputs.
```cpp
// depending only on out_grad
typedef void (*UnaryGradFunctionT0)(const OutputGrad& out_grad,
const EnvArguments& env,
TBlob* in_grad,
OpReqType req,
RunContext ctx);
// depending only on out_value
typedef void (*UnaryGradFunctionT1)(const OutputGrad& out_grad,
const OutputValue& out_value,
const EnvArguments& env,
TBlob* in_grad,
OpReqType req,
RunContext ctx);
// depending only on in_data
typedef void (*UnaryGradFunctionT2)(const OutputGrad& out_grad,
const Input0& in_data0,
const EnvArguments& env,
TBlob* in_grad,
OpReqType req,
RunContext ctx);
```
Gradient functions of binary operator have similar structures except `Input`, `TBlob`, `OpReqType`
are doubled.
* `GradFunctionArgument`
The `Input0`, `Input`, `OutputValue` and `OutputGrad` all share the structure of `GradFunctionArgument`,
which is defined as:
```cpp
struct GradFunctionArgument {
TBlob data;
}
```

In our smooth l1 loss example, note that it is a `f'(x)`, which utilize input for gradient calculation,
so the `UnaryGradFunctionT2` is suitable. To enable chain rule of gradient, we also need to multiply
`out_grad` from top to the result of `in_grad`.
```cpp
template<typename xpu>
void SmoothL1BackwardUseIn_(const OutputGrad& out_grad,
const Input0& in_data0,
const EnvArguments& env,
TBlob *in_grad,
OpReqType req,
RunContext ctx) {
using namespace mshadow;
using namespace mshadow::expr;
mshadow::Stream<xpu> *s = ctx.get_stream<xpu>();
real_t sigma2 = env.scalar * env.scalar;
MSHADOW_TYPE_SWITCH(in_grad->type_flag_, DType, {
mshadow::Tensor<xpu, 2, DType> src = in_data0.data.get<xpu, 2, DType>(s);
mshadow::Tensor<xpu, 2, DType> ograd = out_grad.data.get<xpu, 2, DType>(s);
mshadow::Tensor<xpu, 2, DType> igrad = in_grad->get<xpu, 2, DType>(s);
ASSIGN_DISPATCH(igrad, req,
ograd * F<mshadow_op::smooth_l1_gradient>(src, ScalarExp<DType>(sigma2)));
});
}
```

### Register SimpleOp to MXNet
After creating shape, function and gradient, it is sufficient to restore them into both NDArray operator and
Symbolic operator. There is a registration macro defined in `operator_util.h` to simplify this process.
```cpp
MXNET_REGISTER_SIMPLE_OP(Name, DEV)
.set_shape_function(Shape)
.set_function(DEV::kDevMask, Function<XPU>, SimpleOpInplaceOption)
.set_gradient(DEV::kDevMask, Gradient<XPU>, SimpleOpInplaceOption)
.describe("description");
```
`SimpleOpInplaceOption` is defined as:
```cpp
enum SimpleOpInplaceOption {
kNoInplace, // do not allow inplace in arguments
kInplaceInOut, // allow inplace in with out (unary)
kInplaceOutIn, // allow inplace out_grad with in_grad (unary)
kInplaceLhsOut, // allow inplace left operand with out (binary)
kInplaceOutLhs // allow inplace out_grad with lhs_grad (binary)
};
```

In our example, we have a gradient function that relies on input data, so the function can not be written in
place. The output gradient is useless after gradient computation, so the gradient can be written inplace.
```cpp
MXNET_REGISTER_SIMPLE_OP(smooth_l1, XPU)
.set_function(XPU::kDevMask, SmoothL1Forward_<XPU>, kNoInplace)
.set_gradient(XPU::kDevMask, SmoothL1BackwardUseIn_<XPU>, kInplaceOutIn)
.set_enable_scalar(true)
.describe("Calculate Smooth L1 Loss(lhs, scalar)");
```
Remember from shape functions that a default behavior without `set_shape_function` will be forcing the inputs
(if binary) to be of the same shape and yield the same shape for output. The `set_enable_scalar` will be
discussed in addtional information.

### All in a List
* Create a shape function for determining the output shape
* Create a function as the forward routine by choosing a suitable function type
* Create a gradient as the backward routine by choosing a suitable gradient type
* Register the operator using registration process

## Additional Information on SimpleOp
### Usage on `EnvArguments`
Some operations may need a scalar as input, such as gradient scale, a set of keyword arguments
controlling behavior or a temporary space to speed up calculations.
`EnvArguments` provide additional arguments and resources to make calculations more scalable
and efficient.
```cpp
struct EnvArguments {
real_t scalar; // scalar argument, if enabled
std::vector<std::pair<std::string, std::string> > kwargs; // keyword arguments
std::vector<Resource> resource; // pointer to the resources requested
};
```

More registration parameters are required to enable these additional features. `scalar` and `kwargs`
can not be present at the same time to prevent confusions on parameters. To enable `scalar`, use
`set_enable_scalar(bool enable_scalar)` in registration. Then in forward function and gradients,
this `scalar` can be accessed from `env.scalar` as in function parameter `EnvArguments env`.

To enable `kwargs`, use `set_enable_kwargs(bool enable_kwargs)` in registration. Then in forward
functions and gradients, additional arguments are contained in `env.kwarg`, which is defined as
`std::vector<std::pair<std::string, std::string> >`. The DMLC parameter structure can be used to
simplify parsing keyword arguments. Refer to the [guide on parameter structure](https://github.com/dmlc/dmlc-core/blob/master/doc/parameter.md)
for more details.

Addtional resources like `mshadow::Random<xpu>` and temporary memory space can also be requested and
accessed from `EnvArguments.resource`. The registration routine is `set_resource_request(ResourceRequest req)`
or `set_resource_request(const std::vector<ResourceRequest>)`, where `mxnet::ResourceRequest` is defined as in:
```cpp
struct ResourceRequest {
enum Type { // Resource type, indicating what the pointer type is
kRandom, // mshadow::Random<xpu> object
kTempSpace // A dynamic temp space that can be arbitrary size
};
Type type; // type of resources
};
```
The registration will request the declared resource requests from `mxnet::ResourceManager` and place resources
in `std::vector<Resource> resource` in `EnvArguments`. To access resources, write:
```cpp
auto tmp_space_res = env.resources[0].get_space(some_shape, some_stream);
auto rand_res = env.resources[0].get_random(some_stream);
```
Refer to `src/operator/loss_binary_op-inl.h` for a concrete example.

In our smooth l1 loss example, a scalar input is needed to mark the turning point of loss function. Therefore
in the registration process, we use `set_enable_scalar(true)` and use `env.scalar` in function and gradient
declarations.

### Crafting a Tensor Operation
Since actual computation utilize `mshadow` library and sometimes we don't have functions readily available, it is
possible to craft such tensor operations in operator implementations. If such functions are elementwise defined, we
can implement them as a `mxnet::op::mshadow_op`. `src/operator/mshadow_op.h` contains a lot of `mshadow_op`, serving
as a good example. `mshadow_op` are expression mappers and deal with the scalar case of desired functions. Refer to
[mshadow expression API guide](https://github.com/dmlc/mshadow/tree/master/doc) for details.

It could also be possible that the operation cannot be done in an elementwise way, like the softmax loss and gradient.
Then there is a need to create a new tensor operation. Then we need to create a `mshadow` function and a `mshadow::cuda`
function directly. Please refer to `mshadow` library for details or `src/operator/roi_pooling.cc` for an example.

In our smooth l1 loss example, we create two mappers, namely the scalar cases of smooth l1 loss and gradient.
```cpp
namespace mshadow_op {
struct smooth_l1_loss {
// a is x, b is sigma2
MSHADOW_XINLINE static real_t Map(real_t a, real_t b) {
if (a > 1.0f / b) {
return a - 0.5f / b;
} else if (a < -1.0f / b) {
return -a - 0.5f / b;
} else {
return 0.5f * a * a * b;
}
}
};
}
```
The gradient is similar, which can be found in `src/operator/smooth_l1_unary-inl.h`.

### Beyond Two Operands
This new unified API is designed to fulfill the fundamentals of an operation. For operators with more than two inputs,
more than one outputs, or in need of more features, please refer to the original [Operator API](operator.md).
2 changes: 1 addition & 1 deletion src/operator/smooth_l1_unary-inl.h
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ void SmoothL1BackwardUseIn_(const OutputGrad& out_grad,
}

MXNET_REGISTER_SIMPLE_OP(smooth_l1, XPU)
.set_function(XPU::kDevMask, SmoothL1Forward_<XPU>, kInplaceInOut)
.set_function(XPU::kDevMask, SmoothL1Forward_<XPU>, kNoInplace)
.set_gradient(XPU::kDevMask, SmoothL1BackwardUseIn_<XPU>, kInplaceOutIn)
.set_enable_scalar(true)
.describe("Calculate Smooth L1 Loss(lhs, scalar)");
Expand Down

0 comments on commit 7841253

Please sign in to comment.