-
Notifications
You must be signed in to change notification settings - Fork 6.8k
slice operator supporting arbitrary values of step #8558
Conversation
} else if (s < 0) { | ||
e = -1; | ||
} | ||
CHECK_LE(e, len) << "slicing with end[" << i << "]=" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In thsse CHECK_XXX macros, is function call made for every <<, even if the check succeeds? If so, does this affect performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not called if check is successful. See the code:
https://github.com/dmlc/dmlc-core/blob/595d02c0e87be8a0846700462b6f45f1b1031e39/include/dmlc/logging.h#L89
All the << calls happen only when _check_err
has a non-empty string stored.
src/operator/tensor/matrix_op-inl.h
Outdated
stride *= dshape[k]; | ||
} | ||
KERNEL_ASSIGN(igrad[irow*data_last_dim_size+j*step_last_dim+begin_last_dim], | ||
req, ograd[ograd_offset++]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spaces around operators
For CPU build: Any chance you can break this down into forward and backward times? Is it known why this is slower than mshadow version? |
@cjolivier01 The time is for forward only, no backward (the backward kernel is essentially the same as the forward one). It includes shape/dtype inferences and forward function calls. There might be a few reasons that lead to more runtime in the new implementation:
I can investigate to see which part takes this 10% more time. |
src/operator/tensor/matrix_op.cc
Outdated
NNVM_REGISTER_OP(slice) | ||
.add_alias("_sparse_slice") | ||
.add_alias("crop") | ||
NNVM_REGISTER_OP(slice_v1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need slice_v1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
@cjolivier01 I tried replacing infer shape and forward kernel with the same ones as in mshadow, but there is still about 10% difference. That means there is some difference between the runs of mshadow and Kernel::Launch. Any insights? |
SHAPE_ASSIGN_CHECK(*out_attrs, 0, GetSliceShape(param, dshape)); | ||
return true; | ||
} | ||
|
||
inline bool SliceForwardInferStorageType(const nnvm::NodeAttrs& attrs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the slice implementation for CSR support arbitrary step size? Do we want to update the infer storage fucntion to fallback appropriately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added fallback check now. Please review again.
Implement slice op backward Change parallelization approach for slicing Add unit test Fix lint and typo Fix doc Fix doc Remove slice_v1 Address cr Remove slice_v1 in .cu Change step data type Add fallback for slicing csr with non-trivial step
* Implement slice op forward supporting abitrary step value Implement slice op backward Change parallelization approach for slicing Add unit test Fix lint and typo Fix doc Fix doc Remove slice_v1 Address cr Remove slice_v1 in .cu Change step data type Add fallback for slicing csr with non-trivial step * Add error handling for take op infer shape
* Implement slice op forward supporting abitrary step value Implement slice op backward Change parallelization approach for slicing Add unit test Fix lint and typo Fix doc Fix doc Remove slice_v1 Address cr Remove slice_v1 in .cu Change step data type Add fallback for slicing csr with non-trivial step * Add error handling for take op infer shape
* Implement slice op forward supporting abitrary step value Implement slice op backward Change parallelization approach for slicing Add unit test Fix lint and typo Fix doc Fix doc Remove slice_v1 Address cr Remove slice_v1 in .cu Change step data type Add fallback for slicing csr with non-trivial step * Add error handling for take op infer shape
Description
step
along withbegin
andend
, i.e.slice(data, begin, end, step)
, wherestep
is an optional parameter.Checklist
Essentials
make lint
)Comments
step
.NDArray
withstep!=1
. Currently, it usesgather_nd
op to realize that functionality, which has heavy overhead of making indexNDArray
from slices.slice_v1
(mshadow version) andslice
(Kernel::Launch version).Hardware
p2.xlarge (4 omp threads)
Commit
ba9de66
GPU build
mx.nd.slice_v1: 10000 repeats costs 0.561281 seconds
mx.nd.slice: 10000 repeats costs 0.562561 seconds
CPU-only build
mx.nd.slice_v1: 10000 repeats costs 1.049587 seconds
mx.nd.slice: 10000 repeats costs 1.130866 seconds
Benchmark script
@piiswrong @eric-haibin-lin @anirudh2290 @rahul003 @cjolivier01