Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test] Sparse mega pr #168

Merged
merged 66 commits into from
Aug 16, 2017
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
c5f6648
[WIP] Sparse Tensor (#5800)
eric-haibin-lin Jun 26, 2017
db65770
move storage type vector from nnvm to mxnet (#7054)
eric-haibin-lin Jul 15, 2017
e2607da
fix failed tests. add back 64bit support for dot
eric-haibin-lin Jul 17, 2017
978748e
Improve copy sparse tensors (#7003)
reminisce Jul 15, 2017
ce0fec8
bug fix for IdentityComputeRsp
eric-haibin-lin Jul 18, 2017
a2b3d3e
fix lint
eric-haibin-lin Jul 18, 2017
27c9ac0
add data partition for libsvm iter (#7027)
eric-haibin-lin Jul 21, 2017
3a394ea
fix ndarray namespace
eric-haibin-lin Jul 22, 2017
cf61a9e
remove sparse embedding (#7165)
eric-haibin-lin Jul 23, 2017
fe62976
remove untested gpu operators (#7172)
eric-haibin-lin Jul 24, 2017
4de0fdd
Fix ndarray aux data issue (#7098)
reminisce Jul 25, 2017
a472b61
Support K-dimensional row-sparse tensor (#7179)
eric-haibin-lin Jul 25, 2017
6a01b6e
Improve sparse ndarray error message (#7181)
eric-haibin-lin Jul 25, 2017
05ddf38
construct row_sparse ndarray for dist-async
eric-haibin-lin Jun 26, 2017
f57fc3c
Merge remote-tracking branch 'upstream/master' into dmlc-sparse-squash
eric-haibin-lin Jul 26, 2017
0ed14d1
fix DotCsrRspRspImpl error message (#7191)
stefanhenneking Jul 26, 2017
f0af872
GPU implementation of cast_storage (dense to csr) (#7081)
stefanhenneking Jul 27, 2017
6f0719f
Sparse square sum (#7206)
reminisce Jul 27, 2017
ec2c4bf
Modify and Add documentation for mx.nd.zeros (#7197)
anirudh2290 Jul 27, 2017
88eaac6
Merge remote-tracking branch 'upstream/master' into dmlc-sparse-squash
eric-haibin-lin Jul 27, 2017
3b94a3c
Expose kWriteInplace for imperative execution (fcompute_ex and fstate…
eric-haibin-lin Jul 28, 2017
55e4763
Operator add_n for row sparse ndarrays (#7244)
reminisce Aug 1, 2017
7e1647c
GPU implementation of cast_storage (dense to rsp) (#7223)
stefanhenneking Aug 1, 2017
5905ddc
merge with dmlc/master
eric-haibin-lin Aug 2, 2017
d8a9aba
resolve merge conflict in ndarray.load
eric-haibin-lin Aug 2, 2017
f686174
Improve StatefulOp/FCompute storage fallback (#134)
eric-haibin-lin Aug 2, 2017
d0579c4
update sparse ndarray api (#139)
eric-haibin-lin Aug 3, 2017
56b5a63
Merge remote-tracking branch 'upstream/master' into sparse
eric-haibin-lin Aug 3, 2017
325f4db
Handle ograd_stype='row_sparse' for square_sum backward (#143)
reminisce Aug 3, 2017
5866b2b
Sparse retain improvement (#138)
reminisce Aug 5, 2017
9298bfa
ignoring variables in SimpleBind that is used on python's sparse bran…
sergeykolychev Aug 5, 2017
1f07771
add bias term to fm test (#145)
eric-haibin-lin Aug 5, 2017
d511938
merge with upstream/master. resolve conflict in c_api_ndarray.cc
eric-haibin-lin Aug 5, 2017
6956431
update ndarray.nd, remove `invoke` from excluded members (#137)
eric-haibin-lin Aug 6, 2017
6c9a350
Merge remote-tracking branch 'upstream/master' into sparse
eric-haibin-lin Aug 6, 2017
66b7b8a
support storage fallback with mutable inputs (#147)
eric-haibin-lin Aug 6, 2017
cf8ddcf
Merge branch 'sparse' of https://github.com/eric-haibin-lin/mxnet int…
eric-haibin-lin Aug 6, 2017
0396c9a
Merge remote-tracking branch 'upstream/master' into sparse
eric-haibin-lin Aug 7, 2017
2dc7dc9
Code changes based on reviews (#144)
eric-haibin-lin Aug 8, 2017
f318c9d
small edits according to reviews (#151)
eric-haibin-lin Aug 8, 2017
85cbc60
Merge remote-tracking branch 'upstream/master' into sparse
eric-haibin-lin Aug 8, 2017
fc1aa6e
fix lint (#152)
eric-haibin-lin Aug 8, 2017
9ba96b9
resolve conflict in ndarray.py and capi
eric-haibin-lin Aug 8, 2017
6cbdf98
resolve conflicts in license header
eric-haibin-lin Aug 8, 2017
253ae57
add license to all new files in sparse brnach (#154)
eric-haibin-lin Aug 9, 2017
b2ad302
Allocate temp data on the fly for some casting operations (#149)
cjolivier01 Aug 9, 2017
129148c
Merge remote-tracking branch 'upstream/master' into sparse
eric-haibin-lin Aug 9, 2017
d6f987d
fix utf8 encoding in sparse ndarray
eric-haibin-lin Aug 9, 2017
955e97f
Merge branch 'sparse' of https://github.com/eric-haibin-lin/mxnet int…
eric-haibin-lin Aug 9, 2017
bc33101
Extending the GPU dot operator (#7226)
stefanhenneking Aug 9, 2017
8040953
Merge remote-tracking branch 'upstream/master' into sparse
eric-haibin-lin Aug 9, 2017
2d93d72
Add get_synthetic_dataset function to util (#146)
anirudh2290 Aug 10, 2017
80a590d
temporary fix for batch norm storage fallback (#156)
eric-haibin-lin Aug 10, 2017
92f54d2
support random_uniform/normal/gamma with row_sparse output (#155)
eric-haibin-lin Aug 10, 2017
17bfa4e
Merge remote-tracking branch 'upstream/master' into sparse
eric-haibin-lin Aug 10, 2017
ef3b442
Merge remote-tracking branch 'upstream/master' into sparse
eric-haibin-lin Aug 10, 2017
a44afed
Square sum backward support one more case (#161)
reminisce Aug 10, 2017
ceca9b6
Add documentation for sparse ops (#148)
eric-haibin-lin Aug 11, 2017
1c60a05
A few fixes (#163)
eric-haibin-lin Aug 11, 2017
04e9129
Merge branch 'sparse' of https://github.com/eric-haibin-lin/mxnet int…
eric-haibin-lin Aug 12, 2017
8ebc012
merge with upstream/master
eric-haibin-lin Aug 12, 2017
889a09e
Minor fixes sparse ops (#160)
stefanhenneking Aug 14, 2017
6b0cac1
sparse Adam optimizer (#164)
eric-haibin-lin Aug 14, 2017
eeff444
kvstore.row_sparse_pull for GPU and end-to-end benchmark: CPU vs. mul…
reminisce Aug 15, 2017
54f698b
fix bug in adam update (#167)
eric-haibin-lin Aug 15, 2017
6fa078e
change sparse example from regression to classification (#165)
eric-haibin-lin Aug 15, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
sparse Adam optimizer (#164)
*  add sparse adam

* register gpu op

* add comments

* cr comments
  • Loading branch information
eric-haibin-lin authored Aug 14, 2017
commit 6b0cac1f662eb7a7d2da5dcd7aa6ad01fe3c5d69
141 changes: 141 additions & 0 deletions src/operator/optimizer_op-inl.h
Original file line number Diff line number Diff line change
Expand Up @@ -700,6 +700,147 @@ inline void AdamUpdate(const nnvm::NodeAttrs& attrs,
});
}

/*!
* Note: this kernel performs sparse adam update. For each row-slice in row_sparse
* gradient, it finds the corresponding elements in weight, mean and var and performs
* the update.
* The kernel assumes dense weight/mean/var, and row_sparse gradient
*/
template<int req>
struct AdamDnsRspDnsKernel {
template<typename DType, typename IType>
MSHADOW_XINLINE static void Map(int i, const nnvm::dim_t row_length, DType* out_data,
DType* mean_data, DType* var_data, const DType* weight_data, const IType* grad_idx,
const DType* grad_data, const DType clip_gradient, const DType beta1, const DType beta2,
const DType lr, const DType wd, const DType epsilon, const DType rescale_grad) {
using nnvm::dim_t;
using namespace mshadow_op;
const dim_t row_offset = grad_idx[i] * row_length;
for (dim_t j = 0; j < row_length; j++) {
// index in data/mean/var
const dim_t data_i = row_offset + j;
// index in grad
const dim_t grad_i = i * row_length + j;
const DType grad_rescaled = grad_data[grad_i] * rescale_grad + weight_data[data_i] * wd;
if (clip_gradient >= 0.0f) {
mean_data[data_i] = beta1 * mean_data[data_i] + (1.f - beta1) *
clip::Map(grad_rescaled, clip_gradient);
var_data[data_i] = beta2 * var_data[data_i] + (1.f - beta2) * square::Map(
clip::Map(grad_rescaled, clip_gradient));
} else {
mean_data[data_i] = beta1 * mean_data[data_i] + (1.f - beta1) * grad_rescaled;
var_data[data_i] = beta2 * var_data[data_i] +
(1.f - beta2) * grad_rescaled * grad_rescaled;
}
KERNEL_ASSIGN(out_data[data_i], req, weight_data[data_i] - lr * mean_data[data_i] /
square_root::Map(var_data[data_i]) + epsilon);
}
}
};


template<typename xpu>
inline void AdamUpdateDnsRspDnsImpl(const AdamParam& param,
const OpContext& ctx,
const TBlob& weight,
const NDArray& grad,
const TBlob& mean,
const TBlob& var,
const OpReqType& req,
TBlob *out) {
using namespace mxnet_op;
using namespace rowsparse;
Stream<xpu>* s = ctx.get_stream<xpu>();
if (!grad.storage_initialized() || req == kNullOp) return;
CHECK_EQ(req, kWriteInplace) << "kWriteInplace is expected for sparse adam_update";
CHECK_GT(weight.shape_.Size(), 0);
CHECK_GT(mean.shape_.Size(), 0);
CHECK_GT(var.shape_.Size(), 0);

MSHADOW_REAL_TYPE_SWITCH(weight.type_flag_, DType, {
MSHADOW_IDX_TYPE_SWITCH(grad.aux_type(kIdx), IType, {
MXNET_ASSIGN_REQ_SWITCH(req, req_type, {
const DType* weight_data = weight.dptr<DType>();
const IType* grad_idx = grad.aux_data(kIdx).dptr<IType>();
const DType* grad_val = grad.data().dptr<DType>();
DType* mean_data = mean.dptr<DType>();
DType* var_data = var.dptr<DType>();
DType* out_data = out->dptr<DType>();
nnvm::dim_t num_rows = grad.aux_shape(kIdx)[0];
const auto row_length = weight.shape_.ProdShape(1, weight.ndim());
Kernel<AdamDnsRspDnsKernel<req_type>, xpu>::Launch(s, num_rows, row_length,
out_data, mean_data, var_data, weight_data, grad_idx, grad_val,
static_cast<DType>(param.clip_gradient), static_cast<DType>(param.beta1),
static_cast<DType>(param.beta2), static_cast<DType>(param.lr),
static_cast<DType>(param.wd), static_cast<DType>(param.epsilon),
static_cast<DType>(param.rescale_grad));
});
});
});
}

template<typename xpu>
inline void AdamUpdateRspRspRspImpl(const AdamParam& param,
const OpContext& ctx,
const NDArray& weight,
const NDArray& grad,
const NDArray& mean,
const NDArray& var,
const OpReqType& req,
NDArray *out) {
using namespace mshadow;
using namespace mshadow::expr;
using namespace mxnet_op;
using namespace rowsparse;
CHECK_RSP_ALL_ROWS_NON_ZERO(weight, "AdamUpdate", "weights");
Stream<xpu>* s = ctx.get_stream<xpu>();
// fill mean and variance with zero values in order to reuse the sgd mom dns impl
if (!mean.storage_initialized()) {
NDArray mean_zeros = mean;
FillDnsZerosRspImpl(s, &mean_zeros);
}
if (!var.storage_initialized()) {
NDArray var_zeros = var;
FillDnsZerosRspImpl(s, &var_zeros);
}
TBlob out_blob = out->data();
// reuse dns rsp implementation when storage_shape == shape
AdamUpdateDnsRspDnsImpl<xpu>(param, ctx, weight.data(), grad, mean.data(),
var.data(), req, &out_blob);
}


template<typename xpu>
inline void AdamUpdateEx(const nnvm::NodeAttrs& attrs,
const OpContext &ctx,
const std::vector<NDArray> &inputs,
const std::vector<OpReqType> &req,
const std::vector<NDArray> &outputs) {
const AdamParam& param = nnvm::get<AdamParam>(attrs.parsed);
mshadow::Stream<xpu>* s = ctx.get_stream<xpu>();
const auto weight_stype = inputs[0].storage_type();
const auto grad_stype = inputs[1].storage_type();
const auto mean_stype = inputs[2].storage_type();
const auto var_stype = inputs[3].storage_type();

const auto out_stype = outputs[0].storage_type();
CHECK_EQ(mean_stype, weight_stype) << "Inconsistent storage type detected between "
<< " mean.stype = " << mean_stype << " and weight.stype = " << weight_stype;
CHECK_EQ(var_stype, weight_stype) << "Inconsistent storage type detected between "
<< " var.stype = " << var_stype << " and weight.stype = " << weight_stype;
if (weight_stype == kRowSparseStorage && mean_stype == kRowSparseStorage &&
var_stype == kRowSparseStorage && grad_stype == kRowSparseStorage &&
out_stype == kRowSparseStorage) {
NDArray out = outputs[0];
AdamUpdateRspRspRspImpl<xpu>(param, ctx, inputs[0], inputs[1], inputs[2],
inputs[3], req[0], &out);
} else {
LOG(FATAL) << "Unexpected storage types: weight.stype = " << weight_stype
<< ", var.stype = " << var_stype << ", mean.stype = " << mean_stype
<< ", grad.stype = " << grad_stype;
}
}

// This RMSProp code follows the version in
// http://arxiv.org/pdf/1308.0850v5.pdf Eq(38) - Eq(45)
// by Alex Graves, 2013.
Expand Down
1 change: 1 addition & 0 deletions src/operator/optimizer_op.cc
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@ It updates the weights using::
return std::vector<uint32_t>{2, 3};
})
.set_attr<FCompute>("FCompute<cpu>", AdamUpdate<cpu>)
.set_attr<FComputeEx>("FComputeEx<cpu>", AdamUpdateEx<cpu>)
.add_argument("weight", "NDArray-or-Symbol", "Weight")
.add_argument("grad", "NDArray-or-Symbol", "Gradient")
.add_argument("mean", "NDArray-or-Symbol", "Moving mean")
Expand Down
3 changes: 2 additions & 1 deletion src/operator/optimizer_op.cu
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,8 @@ NNVM_REGISTER_OP(mp_sgd_mom_update)
.set_attr<FCompute>("FCompute<gpu>", MP_SGDMomUpdate<gpu>);

NNVM_REGISTER_OP(adam_update)
.set_attr<FCompute>("FCompute<gpu>", AdamUpdate<gpu>);
.set_attr<FCompute>("FCompute<gpu>", AdamUpdate<gpu>)
.set_attr<FComputeEx>("FComputeEx<gpu>", AdamUpdateEx<gpu>);

NNVM_REGISTER_OP(rmsprop_update)
.set_attr<FCompute>("FCompute<gpu>", RMSPropUpdate<gpu>);
Expand Down
2 changes: 1 addition & 1 deletion src/operator/tensor/init_op.h
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ inline void FillDnsZerosRspImpl(mshadow::Stream<xpu> *s, NDArray *dst) {
auto idx = dst->aux_data(kIdx).FlatTo1D<xpu, IType>(s);
auto val = dst->data();
Kernel<set_zero, xpu>::Launch(s, val.Size(), val.dptr<DType>());
ASSIGN_DISPATCH(idx, kWriteTo, range<IType>(0, num_rows, 1, 1))
ASSIGN_DISPATCH(idx, kWriteTo, range<IType>(0, num_rows, 1, 1));
});
});
}
Expand Down
40 changes: 23 additions & 17 deletions tests/python/unittest/test_optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -312,12 +312,13 @@ def test_sparse_sgd():
class PyAdam(mx.optimizer.Optimizer):
"""python reference implemenation of adam"""
def __init__(self, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8,
decay_factor=(1 - 1e-8), **kwargs):
decay_factor=(1 - 1e-8), sparse_update=False, **kwargs):
super(PyAdam, self).__init__(learning_rate=learning_rate, **kwargs)
self.beta1 = beta1
self.beta2 = beta2
self.epsilon = epsilon
self.decay_factor = decay_factor
self.sparse_update = sparse_update

def create_state(self, index, weight):
"""Create additional optimizer state: mean, variance
Expand Down Expand Up @@ -355,21 +356,28 @@ def update(self, index, weight, grad, state):
mean, variance = state

wd = self._get_wd(index)
grad = grad * self.rescale_grad + wd * weight
if self.clip_gradient is not None:
mx.nd.clip(grad, -self.clip_gradient, self.clip_gradient, out=grad)

mean *= self.beta1
mean += grad * (1. - self.beta1)

variance *= self.beta2
variance += (1 - self.beta2) * mx.nd.square(grad, out=grad)

num_rows = weight.shape[0]
coef1 = 1. - self.beta1**t
coef2 = 1. - self.beta2**t
lr *= math.sqrt(coef2)/coef1

weight -= lr*mean/(mx.nd.sqrt(variance) + self.epsilon)
for row in range(num_rows):
# check row slices of all zeros
all_zeros = mx.test_utils.almost_equal(grad[row].asnumpy(), np.zeros_like(grad[row].asnumpy()))
# skip zeros during sparse update
if all_zeros and self.sparse_update:
continue
grad[row] = grad[row] * self.rescale_grad + wd * weight[row]
# clip gradients
if self.clip_gradient is not None:
mx.nd.clip(grad[row], -self.clip_gradient, self.clip_gradient, out=grad[row])
# update mean
mean[row] *= self.beta1
mean[row] += grad[row] * (1. - self.beta1)
# update variance
variance[row] *= self.beta2
variance[row] += (1 - self.beta2) * mx.nd.square(grad[row], out=grad[row])
# update weight
weight[row] -= lr*mean[row]/(mx.nd.sqrt(variance[row]) + self.epsilon)


def test_adam():
Expand All @@ -386,10 +394,8 @@ def test_adam():
{'rescale_grad': 0.8, 'wd': 0.05}]
for kwarg in kwargs:
compare_optimizer(opt1(**kwarg), opt2(**kwarg), shape, np.float32)
# test operator fallback on cpu
if (default_context() == mx.cpu()):
compare_optimizer(opt1(**kwarg), opt2(**kwarg), shape,
np.float32, g_stype='row_sparse')
compare_optimizer(opt1(sparse_update=True, **kwarg), opt2(**kwarg), shape,
np.float32, w_stype='row_sparse', g_stype='row_sparse')

# RMSProp
class PyRMSProp(mx.optimizer.Optimizer):
Expand Down