slice operator supporting arbitrary values of step #8558

reminisce · 2017-11-06T03:57:39Z

Description

Re-implemented slice operator using Kernel::Launch
Added support for arbitrary values of step along with begin and end, i.e. slice(data, begin, end, step), where step is an optional parameter.

Checklist

Essentials

Passed code style checking (make lint)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
For user-facing API changes, API doc string has been updated.
To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Comments

The kernel implementation is based upon the parallelization approach of slice operator in mshadow with additional support for arbitrary value of step.
This operator will be used in Continued Work on Advanced Indexing #8246 to support slicing NDArray with step!=1. Currently, it uses gather_nd op to realize that functionality, which has heavy overhead of making index NDArray from slices.
Mini-benchmark slice_v1 (mshadow version) and slice (Kernel::Launch version).

Hardware
p2.xlarge (4 omp threads)

Commit
ba9de66

GPU build
mx.nd.slice_v1: 10000 repeats costs 0.561281 seconds
mx.nd.slice: 10000 repeats costs 0.562561 seconds

CPU-only build
mx.nd.slice_v1: 10000 repeats costs 1.049587 seconds
mx.nd.slice: 10000 repeats costs 1.130866 seconds

Benchmark script

 import mxnet as mx
 import numpy as np
 import time
 from mxnet.test_utils import same
 
 #ctx = mx.gpu(0)
 ctx = mx.cpu(0)
 
 repeat = 10000
 shape = (16, 16, 16, 16)
 a = mx.nd.arange(np.prod(shape), ctx=ctx).reshape(shape=shape)
 begin = (None, 1, 2)
 end = (shape[0], shape[1], shape[2])
 
 # warm up
 for i in range(100):
     b = mx.nd.slice_v1(a, begin=begin, end=end)
     c = mx.nd.slice(a, begin=begin, end=end)
 
 mx.nd.waitall()
 start = time.time()
 for i in range(repeat):
     c = mx.nd.slice_v1(a, begin=begin, end=end)
 mx.nd.waitall()
 elapsed = time.time() - start
 print("mx.nd.slice_v1: %d repeats costs %f seconds" % (repeat, elapsed))
 
 start = time.time()
 for i in range(repeat):
     b = mx.nd.slice(a, begin=begin, end=end)
 mx.nd.waitall()
 elapsed = time.time() - start
 print("mx.nd.slice: %d repeats costs %f seconds" % (repeat, elapsed))
 
 assert same(c.asnumpy(), b.asnumpy())

@piiswrong @eric-haibin-lin @anirudh2290 @rahul003 @cjolivier01

cjolivier01 · 2017-11-06T04:04:25Z

src/operator/tensor/matrix_op-inl.h

+    } else if (s < 0) {
+      e = -1;
+    }
+    CHECK_LE(e, len) << "slicing with end[" << i << "]="


In thsse CHECK_XXX macros, is function call made for every <<, even if the check succeeds? If so, does this affect performance?

It's not called if check is successful. See the code:
https://github.com/dmlc/dmlc-core/blob/595d02c0e87be8a0846700462b6f45f1b1031e39/include/dmlc/logging.h#L89
All the << calls happen only when _check_err has a non-empty string stored.

cjolivier01 · 2017-11-06T04:08:25Z

src/operator/tensor/matrix_op-inl.h

+        stride *= dshape[k];
+      }
+      KERNEL_ASSIGN(igrad[irow*data_last_dim_size+j*step_last_dim+begin_last_dim],
+                    req, ograd[ograd_offset++]);


nit: spaces around operators

cjolivier01 · 2017-11-06T04:14:58Z

For CPU build:
mx.nd.slice_v1: 10000 repeats costs 1.049587 seconds
mx.nd.slice: 10000 repeats costs 1.130866 seconds

Any chance you can break this down into forward and backward times? Is it known why this is slower than mshadow version?

reminisce · 2017-11-06T04:27:26Z

@cjolivier01 The time is for forward only, no backward (the backward kernel is essentially the same as the forward one). It includes shape/dtype inferences and forward function calls. There might be a few reasons that lead to more runtime in the new implementation:

Infer shape function is more complicated than the previous version since we need to support step, while the previous version assumes step=1.
Kernel needs to take care of step as well.
Kernel needs to copy shapes of data, out, and step in addition to the shapes of begin and end in the mshadow version.

I can investigate to see which part takes this 10% more time.

piiswrong · 2017-11-06T06:12:20Z

src/operator/tensor/matrix_op.cc

-NNVM_REGISTER_OP(slice)
-.add_alias("_sparse_slice")
-.add_alias("crop")
+NNVM_REGISTER_OP(slice_v1)


don't need slice_v1

reminisce · 2017-11-06T17:30:42Z

@cjolivier01 I tried replacing infer shape and forward kernel with the same ones as in mshadow, but there is still about 10% difference. That means there is some difference between the runs of mshadow and Kernel::Launch. Any insights?

eric-haibin-lin · 2017-11-07T21:16:56Z

src/operator/tensor/matrix_op-inl.h

-  SHAPE_ASSIGN_CHECK(*out_attrs, 0, GetSliceShape(param, dshape));
-  return true;
-}
-
 inline bool SliceForwardInferStorageType(const nnvm::NodeAttrs& attrs,


Does the slice implementation for CSR support arbitrary step size? Do we want to update the infer storage fucntion to fallback appropriately?

Added fallback check now. Please review again.

Implement slice op backward Change parallelization approach for slicing Add unit test Fix lint and typo Fix doc Fix doc Remove slice_v1 Address cr Remove slice_v1 in .cu Change step data type Add fallback for slicing csr with non-trivial step

* Implement slice op forward supporting abitrary step value Implement slice op backward Change parallelization approach for slicing Add unit test Fix lint and typo Fix doc Fix doc Remove slice_v1 Address cr Remove slice_v1 in .cu Change step data type Add fallback for slicing csr with non-trivial step * Add error handling for take op infer shape

cjolivier01 reviewed Nov 6, 2017

View reviewed changes

piiswrong reviewed Nov 6, 2017

View reviewed changes

reminisce force-pushed the slice_op branch from 90abf50 to dc17857 Compare November 6, 2017 17:27

eric-haibin-lin reviewed Nov 7, 2017

View reviewed changes

reminisce force-pushed the slice_op branch from 2256fdd to 2f7ad14 Compare November 8, 2017 04:49

Add error handling for take op infer shape

9a0aedc

piiswrong merged commit 70b68b1 into apache:master Nov 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slice operator supporting arbitrary values of step #8558

slice operator supporting arbitrary values of step #8558

reminisce commented Nov 6, 2017

cjolivier01 Nov 6, 2017

reminisce Nov 6, 2017

cjolivier01 Nov 6, 2017

cjolivier01 commented Nov 6, 2017

reminisce commented Nov 6, 2017 •

edited

Loading

piiswrong Nov 6, 2017

reminisce Nov 6, 2017

reminisce commented Nov 6, 2017

eric-haibin-lin Nov 7, 2017

reminisce Nov 7, 2017

slice operator supporting arbitrary values of step #8558

slice operator supporting arbitrary values of step #8558

Conversation

reminisce commented Nov 6, 2017

Description

Checklist

Essentials

Comments

cjolivier01 Nov 6, 2017

Choose a reason for hiding this comment

reminisce Nov 6, 2017

Choose a reason for hiding this comment

cjolivier01 Nov 6, 2017

Choose a reason for hiding this comment

cjolivier01 commented Nov 6, 2017

reminisce commented Nov 6, 2017 • edited Loading

piiswrong Nov 6, 2017

Choose a reason for hiding this comment

reminisce Nov 6, 2017

Choose a reason for hiding this comment

reminisce commented Nov 6, 2017

eric-haibin-lin Nov 7, 2017

Choose a reason for hiding this comment

reminisce Nov 7, 2017

Choose a reason for hiding this comment

reminisce commented Nov 6, 2017 •

edited

Loading