-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
[TENSOR] Add FlatTo1D for all elementwise ops (#3238)
is update on kvstore always a good thing? |
i didn't see any disadvantage yet. updating on device also should be a On Wed, Sep 7, 2016 at 1:17 AM, Eric Junyuan Xie notifications@github.com
|
For single machine, update on device might be a bit better, so we might still want to keep the original device aggregation option |
is PCI-E topology being exploited in the reduction in current PR? |
@mli Please see if you can fix the issue of bringing device aggregation back, and we merge this in |
was the kvstore=local slow down problem reported by @tornadomeet fixed? |
rolled back to use the previous strategy to set update_on_kvstore for On Fri, Sep 9, 2016 at 6:14 AM, Eric Junyuan Xie notifications@github.com
|
please merge in if it is ready |
i will test kv_store=local today. |
@mli @piiswrong i just tried the newest update again, and when kv_store='local', this pr is more slower than fix it before(so kv_store='local' maybe some different as it before). but when kv_store='device', this pr will bring faster. |
@tornadomeet how many k80 are you using? i tested the performance on both m40 and k80, |
@tornadomeet can you double check? the pr should be the same performance as the current master for kvstore=local. i rechecked on resnet-50. while the major difference comparing to 73a0f6e is due to #3238, and the current master should be faster. at least it is true on my machine. |
@mli ok, i'll test the current master this afternoon~ |
@mli hello, 4gpu, resent-50:
this pr:
the branch before or near 73a0f6e, i'm not sure the accurate branch ,i installed it on 2016.08.13
the speed gap is come from other place, not from this pr, so this pr is ok. |
can you check it by On Mon, Sep 12, 2016 at 9:21 PM, Wei Wu notifications@github.com wrote:
|
the log of 73a0f6e :
but it still slow and the log of 2016.08.10 is :
a little better, but not good as my installed version. |
@mli @tqchen @piiswrong i think we can merge this pr now. |
@tornadomeet You may check with commit 2196588 (20160726), on Windows, 4 K40 GPU, CUDA7.0+CUDNN4.0, I am able to achieve 87 samples/s for ResNet-50. |
thanks, i will check it after holiday. |
* Add channel_ to Shape2D calculation * scalapkg, add example multitask (#3186) * RNN cell demo with ptb LSTM language model (#3197) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * Bulk lint fix (#3211) * [TENSOR] Add FlatTo1D for all elementwise ops (#3238) * Fix little bug on context (#3202) * add PennTreeBank Language Model using lstm model in R (#2659) * Add function 'print_summary' and some revise (#3161) * Add function 'print_summary' and some revise Add function 'print_summary' for print detail information of network, and format argument was add in 'plot_network'. You can use 'print_summary' like: """ net = get_symbol(1000) shape = {'softmax_label': (64, 12), 'data': (64, 3, 224, 224)} mx.viz.print_summary(net, shape=shape) """ If without shape, the number of arguments would be nonsense currently. * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Added my CmakeLists.txt for caffe plugin, etc. * Revert "fix travis scala test config" (#3246) This reverts parts of commit 3e15f62. Reenables testing the Julia bindings * [Scala] Code generation for Symbol (#3217) [scala] auto-generate Symbol functions * fix spelling errors (#3258) Also align grammar and punctuation in short descriptions of features * fix typo in run_test.sh (#3260) * Copy slice along arbitrary axis (#3259) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * add copyslice along arbitrary axis for NDArray * copy_slice_to as an ndarray operator * Python interface to the _copy_slice_to operator * fix lint error * Enable concatenation for dim-1 vectors (#3264) * fix PReLU backward computing (#3277) * Add `reverse` option in Reshape (#3280) * add scala example, end2end neural-style (#3267) add scala example, end2end neural-style * Improve multi-GPU performance (#3241) * update kvstore * update model.py * bandwith tool * update readme * tiny * fix lint * fix batch size of dist_device_sync * fix * fix perf problem of kvstore when only using a single device * roll back to previous strategy how to choose update_on_kvsotre * add an optionl MXNET_ENABLE_GPU_P2P to control whether or not use p2p * update dmlccore (#3293) * Fix newer version of gtest and cpptest (#3294) * when set use_global_stats then do not use cudnn (#3289) * when set use_global_stats then do not use cudnn * fix batch norm with use_global_stats * Fix req+reserve_space in cudnn_rnn (#3274) Fix req Fix reserve_space Allocate reserve_space using Storage * add cudnn off option in Convolution (#3270) * add support for building on power (#3302) * add recent examples, collect some missing tutorials (#3340) * CMake for caffe plugin
* Add channel_ to Shape2D calculation * scalapkg, add example multitask (apache#3186) * RNN cell demo with ptb LSTM language model (apache#3197) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * Bulk lint fix (apache#3211) * [TENSOR] Add FlatTo1D for all elementwise ops (apache#3238) * Fix little bug on context (apache#3202) * add PennTreeBank Language Model using lstm model in R (apache#2659) * Add function 'print_summary' and some revise (apache#3161) * Add function 'print_summary' and some revise Add function 'print_summary' for print detail information of network, and format argument was add in 'plot_network'. You can use 'print_summary' like: """ net = get_symbol(1000) shape = {'softmax_label': (64, 12), 'data': (64, 3, 224, 224)} mx.viz.print_summary(net, shape=shape) """ If without shape, the number of arguments would be nonsense currently. * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Added my CmakeLists.txt for caffe plugin, etc. * Revert "fix travis scala test config" (apache#3246) This reverts parts of commit 3e15f62. Reenables testing the Julia bindings * [Scala] Code generation for Symbol (apache#3217) [scala] auto-generate Symbol functions * fix spelling errors (apache#3258) Also align grammar and punctuation in short descriptions of features * fix typo in run_test.sh (apache#3260) * Copy slice along arbitrary axis (apache#3259) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * add copyslice along arbitrary axis for NDArray * copy_slice_to as an ndarray operator * Python interface to the _copy_slice_to operator * fix lint error * Enable concatenation for dim-1 vectors (apache#3264) * fix PReLU backward computing (apache#3277) * Add `reverse` option in Reshape (apache#3280) * add scala example, end2end neural-style (apache#3267) add scala example, end2end neural-style * Improve multi-GPU performance (apache#3241) * update kvstore * update model.py * bandwith tool * update readme * tiny * fix lint * fix batch size of dist_device_sync * fix * fix perf problem of kvstore when only using a single device * roll back to previous strategy how to choose update_on_kvsotre * add an optionl MXNET_ENABLE_GPU_P2P to control whether or not use p2p * update dmlccore (apache#3293) * Fix newer version of gtest and cpptest (apache#3294) * when set use_global_stats then do not use cudnn (apache#3289) * when set use_global_stats then do not use cudnn * fix batch norm with use_global_stats * Fix req+reserve_space in cudnn_rnn (apache#3274) Fix req Fix reserve_space Allocate reserve_space using Storage * add cudnn off option in Convolution (apache#3270) * add support for building on power (apache#3302) * add recent examples, collect some missing tutorials (apache#3340) * CMake for caffe plugin
* Add channel_ to Shape2D calculation * scalapkg, add example multitask (#3186) * RNN cell demo with ptb LSTM language model (#3197) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * Bulk lint fix (#3211) * [TENSOR] Add FlatTo1D for all elementwise ops (#3238) * Fix little bug on context (#3202) * add PennTreeBank Language Model using lstm model in R (#2659) * Add function 'print_summary' and some revise (#3161) * Add function 'print_summary' and some revise Add function 'print_summary' for print detail information of network, and format argument was add in 'plot_network'. You can use 'print_summary' like: """ net = get_symbol(1000) shape = {'softmax_label': (64, 12), 'data': (64, 3, 224, 224)} mx.viz.print_summary(net, shape=shape) """ If without shape, the number of arguments would be nonsense currently. * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Added my CmakeLists.txt for caffe plugin, etc. * Revert "fix travis scala test config" (#3246) This reverts parts of commit 3e15f62. Reenables testing the Julia bindings * [Scala] Code generation for Symbol (#3217) [scala] auto-generate Symbol functions * fix spelling errors (#3258) Also align grammar and punctuation in short descriptions of features * fix typo in run_test.sh (#3260) * Copy slice along arbitrary axis (#3259) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * add copyslice along arbitrary axis for NDArray * copy_slice_to as an ndarray operator * Python interface to the _copy_slice_to operator * fix lint error * Enable concatenation for dim-1 vectors (#3264) * fix PReLU backward computing (#3277) * Add `reverse` option in Reshape (#3280) * add scala example, end2end neural-style (#3267) add scala example, end2end neural-style * Improve multi-GPU performance (#3241) * update kvstore * update model.py * bandwith tool * update readme * tiny * fix lint * fix batch size of dist_device_sync * fix * fix perf problem of kvstore when only using a single device * roll back to previous strategy how to choose update_on_kvsotre * add an optionl MXNET_ENABLE_GPU_P2P to control whether or not use p2p * update dmlccore (#3293) * Fix newer version of gtest and cpptest (#3294) * when set use_global_stats then do not use cudnn (#3289) * when set use_global_stats then do not use cudnn * fix batch norm with use_global_stats * Fix req+reserve_space in cudnn_rnn (#3274) Fix req Fix reserve_space Allocate reserve_space using Storage * add cudnn off option in Convolution (#3270) * add support for building on power (#3302) * add recent examples, collect some missing tutorials (#3340) * CMake for caffe plugin
* NNVM Refactor (#3194) * Init nnvm change * temp checkin * Move TShape to NNVM * Redirect Symbolic API to NNVM * Add Op Prop Adapter * Finish migrate in shape infer * Pass all symbolic test * temp commit * enable aux data * [EXEC] Basic version of exec for forward only * [EXEC] Enable most optimizations, still wait grad and context * fix legacy op with latest one * Update NNVM NodeRef * Adapt to newer interface * ALl registry of backop is complete * temp commit * Hack finish backward pass * [EXEC] One day pass * [EXEC] Pass all operator unittest * [EXEC] enable model parallel * Fully pass all legacy tests * Remove legacy symbolic code * update news * Make travis compile * Fix python3 * Update viz module to new json format * [NNVM] Imperative Invoke (#3208) * [Engine] Deduplicate Variable Util * [NNVM] NNVM Imperative Invoke * [NNVM] Imperative improve speed * fix * fix * [scala] link libnnvm.a (#3214) * [PYTHON] Optional Cython Module for Symbols (#3242) * [CYTHON] Checkin cython enhancement * fix lint * [DOC] Move common doc to base * [EXEC] Support fcompute (#3249) * [EXEC] Support fcompute * Fix lint * fix lint * [OP] Add alias support (#3261) * Fix path in setup.py (#3276) * Fix path in setup.py * revert the nnvm version * [WIP] Element wise op refactor (#3245) * [OPERATOR] Refactor Unary Ops * [OPERATOR] Refactor Binary Scalar Ops * Use alias * update nnvm version (#3290) * Fix breaking changes after pull master (#3291) * [CYTHON] Cython module for NDArray (#3292) * [NDARRAY] Cython module for ndarray * More strict tests * [NNVM] change of attr to set_attr (#3303) * Update run_test.sh * add nnvm cmake with windows (#3255) * [WIP] binary broadcast wip (#3301) * [WIP] binary broadcast wip [OPERATOR] Binary Broadcast ops fix lint lint fix max and min update submodule before removing reduce axis broad cast reduce ops * update * fix * fix warning * fix * x (#3308) * [IO] Python based ImageIter and Augumenter (#3227) * [IO] Python based ImageIter and Augumenter * fix * fix * fix * [OPT] NNVM Optimizer (#3314) * fix cpython in windows (#3309) * Add Mathematical functions (#3317) * fix image io * add hypot degrees radians cosh sinh tanh arcsinh arccosh arctanh (#3335) * add recent examples, collect some missing tutorials (#3340) * Improving docs & utilities for distributed training example. (#3341) * add init dict * disable SSE for arm hardware e.g. Raspberry Pi (#3346) * Add channel_ to Shape2D calculation (#3181) * Add channel_ to Shape2D calculation * scalapkg, add example multitask (#3186) * RNN cell demo with ptb LSTM language model (#3197) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * Bulk lint fix (#3211) * [TENSOR] Add FlatTo1D for all elementwise ops (#3238) * Fix little bug on context (#3202) * add PennTreeBank Language Model using lstm model in R (#2659) * Add function 'print_summary' and some revise (#3161) * Add function 'print_summary' and some revise Add function 'print_summary' for print detail information of network, and format argument was add in 'plot_network'. You can use 'print_summary' like: """ net = get_symbol(1000) shape = {'softmax_label': (64, 12), 'data': (64, 3, 224, 224)} mx.viz.print_summary(net, shape=shape) """ If without shape, the number of arguments would be nonsense currently. * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Added my CmakeLists.txt for caffe plugin, etc. * Revert "fix travis scala test config" (#3246) This reverts parts of commit 3e15f62. Reenables testing the Julia bindings * [Scala] Code generation for Symbol (#3217) [scala] auto-generate Symbol functions * fix spelling errors (#3258) Also align grammar and punctuation in short descriptions of features * fix typo in run_test.sh (#3260) * Copy slice along arbitrary axis (#3259) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * add copyslice along arbitrary axis for NDArray * copy_slice_to as an ndarray operator * Python interface to the _copy_slice_to operator * fix lint error * Enable concatenation for dim-1 vectors (#3264) * fix PReLU backward computing (#3277) * Add `reverse` option in Reshape (#3280) * add scala example, end2end neural-style (#3267) add scala example, end2end neural-style * Improve multi-GPU performance (#3241) * update kvstore * update model.py * bandwith tool * update readme * tiny * fix lint * fix batch size of dist_device_sync * fix * fix perf problem of kvstore when only using a single device * roll back to previous strategy how to choose update_on_kvsotre * add an optionl MXNET_ENABLE_GPU_P2P to control whether or not use p2p * update dmlccore (#3293) * Fix newer version of gtest and cpptest (#3294) * when set use_global_stats then do not use cudnn (#3289) * when set use_global_stats then do not use cudnn * fix batch norm with use_global_stats * Fix req+reserve_space in cudnn_rnn (#3274) Fix req Fix reserve_space Allocate reserve_space using Storage * add cudnn off option in Convolution (#3270) * add support for building on power (#3302) * add recent examples, collect some missing tutorials (#3340) * CMake for caffe plugin * Fix metric & im2rec.py * [Scala] Nnvm ops for NDArray & Symbol (#3361) * [scala] nnvm op support * [scala] remove unused codes * fix scala native code style * [R] Fix the R interface (#3334) * [R] Fix the R interface. remove man * Fix BN legacy issue * Locate compiled library on Windows (#3369) * Fix metric & im2rec.py (#3375) image io fix * Update legacy op FBackwardInGradIndex (#3376) * Update legacy op FBackwardInGradIndex * fix test * Fix for LRN Layer (#3366) * fixed cpu forward bug * added out_data[lrn_enum::kOut] as backward req. * removed lint * removed duplicate out_data[lrn_enum::kTmpNorm], * removed inplace option * add backward index * include some special functions (#3337) - gamma - gammaln - log1p - expm1 * fix kv build (#3385) * initial profiler branch based on dmlc/mxnet:nnvm * [profiler] add profiler & modify engine API * [profiler] add USE_PROFILER compile flag & modify code for changed engine api * [profiler] add c_api interface & modify graph_executor * [profiler] add python api * [profiler] typo & lint error * [profiler] reduce overhead & add PROFIELR_MESSAGE_FUNCNAME macro * [profiler] remove profiling argument from PushSync/PushAsync * [profiler] refactor profiler.h/.cc * [profiler] improve readability * [profiler] typo && add TODO comment * [profiler] fix ndarray op name & add WaitForVar back * [profiler] add example/profiler/profiler_ndarray.py * [profiler] fix memleak by using op->name * [profiler] fix lint * [profiler] fix lint
* NNVM Refactor (apache#3194) * Init nnvm change * temp checkin * Move TShape to NNVM * Redirect Symbolic API to NNVM * Add Op Prop Adapter * Finish migrate in shape infer * Pass all symbolic test * temp commit * enable aux data * [EXEC] Basic version of exec for forward only * [EXEC] Enable most optimizations, still wait grad and context * fix legacy op with latest one * Update NNVM NodeRef * Adapt to newer interface * ALl registry of backop is complete * temp commit * Hack finish backward pass * [EXEC] One day pass * [EXEC] Pass all operator unittest * [EXEC] enable model parallel * Fully pass all legacy tests * Remove legacy symbolic code * update news * Make travis compile * Fix python3 * Update viz module to new json format * [NNVM] Imperative Invoke (apache#3208) * [Engine] Deduplicate Variable Util * [NNVM] NNVM Imperative Invoke * [NNVM] Imperative improve speed * fix * fix * [scala] link libnnvm.a (apache#3214) * [PYTHON] Optional Cython Module for Symbols (apache#3242) * [CYTHON] Checkin cython enhancement * fix lint * [DOC] Move common doc to base * [EXEC] Support fcompute (apache#3249) * [EXEC] Support fcompute * Fix lint * fix lint * [OP] Add alias support (apache#3261) * Fix path in setup.py (apache#3276) * Fix path in setup.py * revert the nnvm version * [WIP] Element wise op refactor (apache#3245) * [OPERATOR] Refactor Unary Ops * [OPERATOR] Refactor Binary Scalar Ops * Use alias * update nnvm version (apache#3290) * Fix breaking changes after pull master (apache#3291) * [CYTHON] Cython module for NDArray (apache#3292) * [NDARRAY] Cython module for ndarray * More strict tests * [NNVM] change of attr to set_attr (apache#3303) * Update run_test.sh * add nnvm cmake with windows (apache#3255) * [WIP] binary broadcast wip (apache#3301) * [WIP] binary broadcast wip [OPERATOR] Binary Broadcast ops fix lint lint fix max and min update submodule before removing reduce axis broad cast reduce ops * update * fix * fix warning * fix * x (apache#3308) * [IO] Python based ImageIter and Augumenter (apache#3227) * [IO] Python based ImageIter and Augumenter * fix * fix * fix * [OPT] NNVM Optimizer (apache#3314) * fix cpython in windows (apache#3309) * Add Mathematical functions (apache#3317) * fix image io * add hypot degrees radians cosh sinh tanh arcsinh arccosh arctanh (apache#3335) * add recent examples, collect some missing tutorials (apache#3340) * Improving docs & utilities for distributed training example. (apache#3341) * add init dict * disable SSE for arm hardware e.g. Raspberry Pi (apache#3346) * Add channel_ to Shape2D calculation (apache#3181) * Add channel_ to Shape2D calculation * scalapkg, add example multitask (apache#3186) * RNN cell demo with ptb LSTM language model (apache#3197) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * Bulk lint fix (apache#3211) * [TENSOR] Add FlatTo1D for all elementwise ops (apache#3238) * Fix little bug on context (apache#3202) * add PennTreeBank Language Model using lstm model in R (apache#2659) * Add function 'print_summary' and some revise (apache#3161) * Add function 'print_summary' and some revise Add function 'print_summary' for print detail information of network, and format argument was add in 'plot_network'. You can use 'print_summary' like: """ net = get_symbol(1000) shape = {'softmax_label': (64, 12), 'data': (64, 3, 224, 224)} mx.viz.print_summary(net, shape=shape) """ If without shape, the number of arguments would be nonsense currently. * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Added my CmakeLists.txt for caffe plugin, etc. * Revert "fix travis scala test config" (apache#3246) This reverts parts of commit 3e15f62. Reenables testing the Julia bindings * [Scala] Code generation for Symbol (apache#3217) [scala] auto-generate Symbol functions * fix spelling errors (apache#3258) Also align grammar and punctuation in short descriptions of features * fix typo in run_test.sh (apache#3260) * Copy slice along arbitrary axis (apache#3259) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * add copyslice along arbitrary axis for NDArray * copy_slice_to as an ndarray operator * Python interface to the _copy_slice_to operator * fix lint error * Enable concatenation for dim-1 vectors (apache#3264) * fix PReLU backward computing (apache#3277) * Add `reverse` option in Reshape (apache#3280) * add scala example, end2end neural-style (apache#3267) add scala example, end2end neural-style * Improve multi-GPU performance (apache#3241) * update kvstore * update model.py * bandwith tool * update readme * tiny * fix lint * fix batch size of dist_device_sync * fix * fix perf problem of kvstore when only using a single device * roll back to previous strategy how to choose update_on_kvsotre * add an optionl MXNET_ENABLE_GPU_P2P to control whether or not use p2p * update dmlccore (apache#3293) * Fix newer version of gtest and cpptest (apache#3294) * when set use_global_stats then do not use cudnn (apache#3289) * when set use_global_stats then do not use cudnn * fix batch norm with use_global_stats * Fix req+reserve_space in cudnn_rnn (apache#3274) Fix req Fix reserve_space Allocate reserve_space using Storage * add cudnn off option in Convolution (apache#3270) * add support for building on power (apache#3302) * add recent examples, collect some missing tutorials (apache#3340) * CMake for caffe plugin * Fix metric & im2rec.py * [Scala] Nnvm ops for NDArray & Symbol (apache#3361) * [scala] nnvm op support * [scala] remove unused codes * fix scala native code style * [R] Fix the R interface (apache#3334) * [R] Fix the R interface. remove man * Fix BN legacy issue * Locate compiled library on Windows (apache#3369) * Fix metric & im2rec.py (apache#3375) image io fix * Update legacy op FBackwardInGradIndex (apache#3376) * Update legacy op FBackwardInGradIndex * fix test * Fix for LRN Layer (apache#3366) * fixed cpu forward bug * added out_data[lrn_enum::kOut] as backward req. * removed lint * removed duplicate out_data[lrn_enum::kTmpNorm], * removed inplace option * add backward index * include some special functions (apache#3337) - gamma - gammaln - log1p - expm1 * fix kv build (apache#3385) * initial profiler branch based on dmlc/mxnet:nnvm * [profiler] add profiler & modify engine API * [profiler] add USE_PROFILER compile flag & modify code for changed engine api * [profiler] add c_api interface & modify graph_executor * [profiler] add python api * [profiler] typo & lint error * [profiler] reduce overhead & add PROFIELR_MESSAGE_FUNCNAME macro * [profiler] remove profiling argument from PushSync/PushAsync * [profiler] refactor profiler.h/.cc * [profiler] improve readability * [profiler] typo && add TODO comment * [profiler] fix ndarray op name & add WaitForVar back * [profiler] add example/profiler/profiler_ndarray.py * [profiler] fix memleak by using op->name * [profiler] fix lint * [profiler] fix lint
* NNVM Refactor (apache#3194) * Init nnvm change * temp checkin * Move TShape to NNVM * Redirect Symbolic API to NNVM * Add Op Prop Adapter * Finish migrate in shape infer * Pass all symbolic test * temp commit * enable aux data * [EXEC] Basic version of exec for forward only * [EXEC] Enable most optimizations, still wait grad and context * fix legacy op with latest one * Update NNVM NodeRef * Adapt to newer interface * ALl registry of backop is complete * temp commit * Hack finish backward pass * [EXEC] One day pass * [EXEC] Pass all operator unittest * [EXEC] enable model parallel * Fully pass all legacy tests * Remove legacy symbolic code * update news * Make travis compile * Fix python3 * Update viz module to new json format * [NNVM] Imperative Invoke (apache#3208) * [Engine] Deduplicate Variable Util * [NNVM] NNVM Imperative Invoke * [NNVM] Imperative improve speed * fix * fix * [scala] link libnnvm.a (apache#3214) * [PYTHON] Optional Cython Module for Symbols (apache#3242) * [CYTHON] Checkin cython enhancement * fix lint * [DOC] Move common doc to base * [EXEC] Support fcompute (apache#3249) * [EXEC] Support fcompute * Fix lint * fix lint * [OP] Add alias support (apache#3261) * Fix path in setup.py (apache#3276) * Fix path in setup.py * revert the nnvm version * [WIP] Element wise op refactor (apache#3245) * [OPERATOR] Refactor Unary Ops * [OPERATOR] Refactor Binary Scalar Ops * Use alias * update nnvm version (apache#3290) * Fix breaking changes after pull master (apache#3291) * [CYTHON] Cython module for NDArray (apache#3292) * [NDARRAY] Cython module for ndarray * More strict tests * [NNVM] change of attr to set_attr (apache#3303) * Update run_test.sh * add nnvm cmake with windows (apache#3255) * [WIP] binary broadcast wip (apache#3301) * [WIP] binary broadcast wip [OPERATOR] Binary Broadcast ops fix lint lint fix max and min update submodule before removing reduce axis broad cast reduce ops * update * fix * fix warning * fix * x (apache#3308) * [IO] Python based ImageIter and Augumenter (apache#3227) * [IO] Python based ImageIter and Augumenter * fix * fix * fix * [OPT] NNVM Optimizer (apache#3314) * fix cpython in windows (apache#3309) * Add Mathematical functions (apache#3317) * fix image io * add hypot degrees radians cosh sinh tanh arcsinh arccosh arctanh (apache#3335) * add recent examples, collect some missing tutorials (apache#3340) * Improving docs & utilities for distributed training example. (apache#3341) * add init dict * disable SSE for arm hardware e.g. Raspberry Pi (apache#3346) * Add channel_ to Shape2D calculation (apache#3181) * Add channel_ to Shape2D calculation * scalapkg, add example multitask (apache#3186) * RNN cell demo with ptb LSTM language model (apache#3197) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * Bulk lint fix (apache#3211) * [TENSOR] Add FlatTo1D for all elementwise ops (apache#3238) * Fix little bug on context (apache#3202) * add PennTreeBank Language Model using lstm model in R (apache#2659) * Add function 'print_summary' and some revise (apache#3161) * Add function 'print_summary' and some revise Add function 'print_summary' for print detail information of network, and format argument was add in 'plot_network'. You can use 'print_summary' like: """ net = get_symbol(1000) shape = {'softmax_label': (64, 12), 'data': (64, 3, 224, 224)} mx.viz.print_summary(net, shape=shape) """ If without shape, the number of arguments would be nonsense currently. * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Added my CmakeLists.txt for caffe plugin, etc. * Revert "fix travis scala test config" (apache#3246) This reverts parts of commit 3e15f62. Reenables testing the Julia bindings * [Scala] Code generation for Symbol (apache#3217) [scala] auto-generate Symbol functions * fix spelling errors (apache#3258) Also align grammar and punctuation in short descriptions of features * fix typo in run_test.sh (apache#3260) * Copy slice along arbitrary axis (apache#3259) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * add copyslice along arbitrary axis for NDArray * copy_slice_to as an ndarray operator * Python interface to the _copy_slice_to operator * fix lint error * Enable concatenation for dim-1 vectors (apache#3264) * fix PReLU backward computing (apache#3277) * Add `reverse` option in Reshape (apache#3280) * add scala example, end2end neural-style (apache#3267) add scala example, end2end neural-style * Improve multi-GPU performance (apache#3241) * update kvstore * update model.py * bandwith tool * update readme * tiny * fix lint * fix batch size of dist_device_sync * fix * fix perf problem of kvstore when only using a single device * roll back to previous strategy how to choose update_on_kvsotre * add an optionl MXNET_ENABLE_GPU_P2P to control whether or not use p2p * update dmlccore (apache#3293) * Fix newer version of gtest and cpptest (apache#3294) * when set use_global_stats then do not use cudnn (apache#3289) * when set use_global_stats then do not use cudnn * fix batch norm with use_global_stats * Fix req+reserve_space in cudnn_rnn (apache#3274) Fix req Fix reserve_space Allocate reserve_space using Storage * add cudnn off option in Convolution (apache#3270) * add support for building on power (apache#3302) * add recent examples, collect some missing tutorials (apache#3340) * CMake for caffe plugin * Fix metric & im2rec.py * [Scala] Nnvm ops for NDArray & Symbol (apache#3361) * [scala] nnvm op support * [scala] remove unused codes * fix scala native code style * [R] Fix the R interface (apache#3334) * [R] Fix the R interface. remove man * Fix BN legacy issue * Locate compiled library on Windows (apache#3369) * Fix metric & im2rec.py (apache#3375) image io fix * Update legacy op FBackwardInGradIndex (apache#3376) * Update legacy op FBackwardInGradIndex * fix test * Fix for LRN Layer (apache#3366) * fixed cpu forward bug * added out_data[lrn_enum::kOut] as backward req. * removed lint * removed duplicate out_data[lrn_enum::kTmpNorm], * removed inplace option * add backward index * include some special functions (apache#3337) - gamma - gammaln - log1p - expm1 * fix kv build (apache#3385) * initial profiler branch based on dmlc/mxnet:nnvm * [profiler] add profiler & modify engine API * [profiler] add USE_PROFILER compile flag & modify code for changed engine api * [profiler] add c_api interface & modify graph_executor * [profiler] add python api * [profiler] typo & lint error * [profiler] reduce overhead & add PROFIELR_MESSAGE_FUNCNAME macro * [profiler] remove profiling argument from PushSync/PushAsync * [profiler] refactor profiler.h/.cc * [profiler] improve readability * [profiler] typo && add TODO comment * [profiler] fix ndarray op name & add WaitForVar back * [profiler] add example/profiler/profiler_ndarray.py * [profiler] fix memleak by using op->name * [profiler] fix lint * [profiler] fix lint
* NNVM Refactor (#3194) * Init nnvm change * temp checkin * Move TShape to NNVM * Redirect Symbolic API to NNVM * Add Op Prop Adapter * Finish migrate in shape infer * Pass all symbolic test * temp commit * enable aux data * [EXEC] Basic version of exec for forward only * [EXEC] Enable most optimizations, still wait grad and context * fix legacy op with latest one * Update NNVM NodeRef * Adapt to newer interface * ALl registry of backop is complete * temp commit * Hack finish backward pass * [EXEC] One day pass * [EXEC] Pass all operator unittest * [EXEC] enable model parallel * Fully pass all legacy tests * Remove legacy symbolic code * update news * Make travis compile * Fix python3 * Update viz module to new json format * [NNVM] Imperative Invoke (#3208) * [Engine] Deduplicate Variable Util * [NNVM] NNVM Imperative Invoke * [NNVM] Imperative improve speed * fix * fix * [scala] link libnnvm.a (#3214) * [PYTHON] Optional Cython Module for Symbols (#3242) * [CYTHON] Checkin cython enhancement * fix lint * [DOC] Move common doc to base * [EXEC] Support fcompute (#3249) * [EXEC] Support fcompute * Fix lint * fix lint * [OP] Add alias support (#3261) * Fix path in setup.py (#3276) * Fix path in setup.py * revert the nnvm version * [WIP] Element wise op refactor (#3245) * [OPERATOR] Refactor Unary Ops * [OPERATOR] Refactor Binary Scalar Ops * Use alias * update nnvm version (#3290) * Fix breaking changes after pull master (#3291) * [CYTHON] Cython module for NDArray (#3292) * [NDARRAY] Cython module for ndarray * More strict tests * [NNVM] change of attr to set_attr (#3303) * Update run_test.sh * add nnvm cmake with windows (#3255) * [WIP] binary broadcast wip (#3301) * [WIP] binary broadcast wip [OPERATOR] Binary Broadcast ops fix lint lint fix max and min update submodule before removing reduce axis broad cast reduce ops * update * fix * fix warning * fix * x (#3308) * [IO] Python based ImageIter and Augumenter (#3227) * [IO] Python based ImageIter and Augumenter * fix * fix * fix * [OPT] NNVM Optimizer (#3314) * fix cpython in windows (#3309) * Add Mathematical functions (#3317) * fix image io * add hypot degrees radians cosh sinh tanh arcsinh arccosh arctanh (#3335) * add recent examples, collect some missing tutorials (#3340) * Improving docs & utilities for distributed training example. (#3341) * add init dict * disable SSE for arm hardware e.g. Raspberry Pi (#3346) * Add channel_ to Shape2D calculation (#3181) * Add channel_ to Shape2D calculation * scalapkg, add example multitask (#3186) * RNN cell demo with ptb LSTM language model (#3197) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * Bulk lint fix (#3211) * [TENSOR] Add FlatTo1D for all elementwise ops (#3238) * Fix little bug on context (#3202) * add PennTreeBank Language Model using lstm model in R (#2659) * Add function 'print_summary' and some revise (#3161) * Add function 'print_summary' and some revise Add function 'print_summary' for print detail information of network, and format argument was add in 'plot_network'. You can use 'print_summary' like: """ net = get_symbol(1000) shape = {'softmax_label': (64, 12), 'data': (64, 3, 224, 224)} mx.viz.print_summary(net, shape=shape) """ If without shape, the number of arguments would be nonsense currently. * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Added my CmakeLists.txt for caffe plugin, etc. * Revert "fix travis scala test config" (#3246) This reverts parts of commit 3e15f62. Reenables testing the Julia bindings * [Scala] Code generation for Symbol (#3217) [scala] auto-generate Symbol functions * fix spelling errors (#3258) Also align grammar and punctuation in short descriptions of features * fix typo in run_test.sh (#3260) * Copy slice along arbitrary axis (#3259) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * add copyslice along arbitrary axis for NDArray * copy_slice_to as an ndarray operator * Python interface to the _copy_slice_to operator * fix lint error * Enable concatenation for dim-1 vectors (#3264) * fix PReLU backward computing (#3277) * Add `reverse` option in Reshape (#3280) * add scala example, end2end neural-style (#3267) add scala example, end2end neural-style * Improve multi-GPU performance (#3241) * update kvstore * update model.py * bandwith tool * update readme * tiny * fix lint * fix batch size of dist_device_sync * fix * fix perf problem of kvstore when only using a single device * roll back to previous strategy how to choose update_on_kvsotre * add an optionl MXNET_ENABLE_GPU_P2P to control whether or not use p2p * update dmlccore (#3293) * Fix newer version of gtest and cpptest (#3294) * when set use_global_stats then do not use cudnn (#3289) * when set use_global_stats then do not use cudnn * fix batch norm with use_global_stats * Fix req+reserve_space in cudnn_rnn (#3274) Fix req Fix reserve_space Allocate reserve_space using Storage * add cudnn off option in Convolution (#3270) * add support for building on power (#3302) * add recent examples, collect some missing tutorials (#3340) * CMake for caffe plugin * Fix metric & im2rec.py * [Scala] Nnvm ops for NDArray & Symbol (#3361) * [scala] nnvm op support * [scala] remove unused codes * fix scala native code style * [R] Fix the R interface (#3334) * [R] Fix the R interface. remove man * Fix BN legacy issue * Locate compiled library on Windows (#3369) * Fix metric & im2rec.py (#3375) image io fix * Update legacy op FBackwardInGradIndex (#3376) * Update legacy op FBackwardInGradIndex * fix test * Fix for LRN Layer (#3366) * fixed cpu forward bug * added out_data[lrn_enum::kOut] as backward req. * removed lint * removed duplicate out_data[lrn_enum::kTmpNorm], * removed inplace option * add backward index * include some special functions (#3337) - gamma - gammaln - log1p - expm1 * fix kv build (#3385) * initial profiler branch based on dmlc/mxnet:nnvm * [profiler] add profiler & modify engine API * [profiler] add USE_PROFILER compile flag & modify code for changed engine api * [profiler] add c_api interface & modify graph_executor * [profiler] add python api * [profiler] typo & lint error * [profiler] reduce overhead & add PROFIELR_MESSAGE_FUNCNAME macro * [profiler] remove profiling argument from PushSync/PushAsync * [profiler] refactor profiler.h/.cc * [profiler] improve readability * [profiler] typo && add TODO comment * [profiler] fix ndarray op name & add WaitForVar back * [profiler] add example/profiler/profiler_ndarray.py * [profiler] fix memleak by using op->name * [profiler] fix lint * [profiler] fix lint
* NNVM Refactor (apache#3194) * Init nnvm change * temp checkin * Move TShape to NNVM * Redirect Symbolic API to NNVM * Add Op Prop Adapter * Finish migrate in shape infer * Pass all symbolic test * temp commit * enable aux data * [EXEC] Basic version of exec for forward only * [EXEC] Enable most optimizations, still wait grad and context * fix legacy op with latest one * Update NNVM NodeRef * Adapt to newer interface * ALl registry of backop is complete * temp commit * Hack finish backward pass * [EXEC] One day pass * [EXEC] Pass all operator unittest * [EXEC] enable model parallel * Fully pass all legacy tests * Remove legacy symbolic code * update news * Make travis compile * Fix python3 * Update viz module to new json format * [NNVM] Imperative Invoke (apache#3208) * [Engine] Deduplicate Variable Util * [NNVM] NNVM Imperative Invoke * [NNVM] Imperative improve speed * fix * fix * [scala] link libnnvm.a (apache#3214) * [PYTHON] Optional Cython Module for Symbols (apache#3242) * [CYTHON] Checkin cython enhancement * fix lint * [DOC] Move common doc to base * [EXEC] Support fcompute (apache#3249) * [EXEC] Support fcompute * Fix lint * fix lint * [OP] Add alias support (apache#3261) * Fix path in setup.py (apache#3276) * Fix path in setup.py * revert the nnvm version * [WIP] Element wise op refactor (apache#3245) * [OPERATOR] Refactor Unary Ops * [OPERATOR] Refactor Binary Scalar Ops * Use alias * update nnvm version (apache#3290) * Fix breaking changes after pull master (apache#3291) * [CYTHON] Cython module for NDArray (apache#3292) * [NDARRAY] Cython module for ndarray * More strict tests * [NNVM] change of attr to set_attr (apache#3303) * Update run_test.sh * add nnvm cmake with windows (apache#3255) * [WIP] binary broadcast wip (apache#3301) * [WIP] binary broadcast wip [OPERATOR] Binary Broadcast ops fix lint lint fix max and min update submodule before removing reduce axis broad cast reduce ops * update * fix * fix warning * fix * x (apache#3308) * [IO] Python based ImageIter and Augumenter (apache#3227) * [IO] Python based ImageIter and Augumenter * fix * fix * fix * [OPT] NNVM Optimizer (apache#3314) * fix cpython in windows (apache#3309) * Add Mathematical functions (apache#3317) * fix image io * add hypot degrees radians cosh sinh tanh arcsinh arccosh arctanh (apache#3335) * add recent examples, collect some missing tutorials (apache#3340) * Improving docs & utilities for distributed training example. (apache#3341) * add init dict * disable SSE for arm hardware e.g. Raspberry Pi (apache#3346) * Add channel_ to Shape2D calculation (apache#3181) * Add channel_ to Shape2D calculation * scalapkg, add example multitask (apache#3186) * RNN cell demo with ptb LSTM language model (apache#3197) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * Bulk lint fix (apache#3211) * [TENSOR] Add FlatTo1D for all elementwise ops (apache#3238) * Fix little bug on context (apache#3202) * add PennTreeBank Language Model using lstm model in R (apache#2659) * Add function 'print_summary' and some revise (apache#3161) * Add function 'print_summary' and some revise Add function 'print_summary' for print detail information of network, and format argument was add in 'plot_network'. You can use 'print_summary' like: """ net = get_symbol(1000) shape = {'softmax_label': (64, 12), 'data': (64, 3, 224, 224)} mx.viz.print_summary(net, shape=shape) """ If without shape, the number of arguments would be nonsense currently. * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Added my CmakeLists.txt for caffe plugin, etc. * Revert "fix travis scala test config" (apache#3246) This reverts parts of commit 3e15f62. Reenables testing the Julia bindings * [Scala] Code generation for Symbol (apache#3217) [scala] auto-generate Symbol functions * fix spelling errors (apache#3258) Also align grammar and punctuation in short descriptions of features * fix typo in run_test.sh (apache#3260) * Copy slice along arbitrary axis (apache#3259) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * add copyslice along arbitrary axis for NDArray * copy_slice_to as an ndarray operator * Python interface to the _copy_slice_to operator * fix lint error * Enable concatenation for dim-1 vectors (apache#3264) * fix PReLU backward computing (apache#3277) * Add `reverse` option in Reshape (apache#3280) * add scala example, end2end neural-style (apache#3267) add scala example, end2end neural-style * Improve multi-GPU performance (apache#3241) * update kvstore * update model.py * bandwith tool * update readme * tiny * fix lint * fix batch size of dist_device_sync * fix * fix perf problem of kvstore when only using a single device * roll back to previous strategy how to choose update_on_kvsotre * add an optionl MXNET_ENABLE_GPU_P2P to control whether or not use p2p * update dmlccore (apache#3293) * Fix newer version of gtest and cpptest (apache#3294) * when set use_global_stats then do not use cudnn (apache#3289) * when set use_global_stats then do not use cudnn * fix batch norm with use_global_stats * Fix req+reserve_space in cudnn_rnn (apache#3274) Fix req Fix reserve_space Allocate reserve_space using Storage * add cudnn off option in Convolution (apache#3270) * add support for building on power (apache#3302) * add recent examples, collect some missing tutorials (apache#3340) * CMake for caffe plugin * Fix metric & im2rec.py * [Scala] Nnvm ops for NDArray & Symbol (apache#3361) * [scala] nnvm op support * [scala] remove unused codes * fix scala native code style * [R] Fix the R interface (apache#3334) * [R] Fix the R interface. remove man * Fix BN legacy issue * Locate compiled library on Windows (apache#3369) * Fix metric & im2rec.py (apache#3375) image io fix * Update legacy op FBackwardInGradIndex (apache#3376) * Update legacy op FBackwardInGradIndex * fix test * Fix for LRN Layer (apache#3366) * fixed cpu forward bug * added out_data[lrn_enum::kOut] as backward req. * removed lint * removed duplicate out_data[lrn_enum::kTmpNorm], * removed inplace option * add backward index * include some special functions (apache#3337) - gamma - gammaln - log1p - expm1 * fix kv build (apache#3385) * initial profiler branch based on dmlc/mxnet:nnvm * [profiler] add profiler & modify engine API * [profiler] add USE_PROFILER compile flag & modify code for changed engine api * [profiler] add c_api interface & modify graph_executor * [profiler] add python api * [profiler] typo & lint error * [profiler] reduce overhead & add PROFIELR_MESSAGE_FUNCNAME macro * [profiler] remove profiling argument from PushSync/PushAsync * [profiler] refactor profiler.h/.cc * [profiler] improve readability * [profiler] typo && add TODO comment * [profiler] fix ndarray op name & add WaitForVar back * [profiler] add example/profiler/profiler_ndarray.py * [profiler] fix memleak by using op->name * [profiler] fix lint * [profiler] fix lint
* NNVM Refactor (#3194) * Init nnvm change * temp checkin * Move TShape to NNVM * Redirect Symbolic API to NNVM * Add Op Prop Adapter * Finish migrate in shape infer * Pass all symbolic test * temp commit * enable aux data * [EXEC] Basic version of exec for forward only * [EXEC] Enable most optimizations, still wait grad and context * fix legacy op with latest one * Update NNVM NodeRef * Adapt to newer interface * ALl registry of backop is complete * temp commit * Hack finish backward pass * [EXEC] One day pass * [EXEC] Pass all operator unittest * [EXEC] enable model parallel * Fully pass all legacy tests * Remove legacy symbolic code * update news * Make travis compile * Fix python3 * Update viz module to new json format * [NNVM] Imperative Invoke (#3208) * [Engine] Deduplicate Variable Util * [NNVM] NNVM Imperative Invoke * [NNVM] Imperative improve speed * fix * fix * [scala] link libnnvm.a (#3214) * [PYTHON] Optional Cython Module for Symbols (#3242) * [CYTHON] Checkin cython enhancement * fix lint * [DOC] Move common doc to base * [EXEC] Support fcompute (#3249) * [EXEC] Support fcompute * Fix lint * fix lint * [OP] Add alias support (#3261) * Fix path in setup.py (#3276) * Fix path in setup.py * revert the nnvm version * [WIP] Element wise op refactor (#3245) * [OPERATOR] Refactor Unary Ops * [OPERATOR] Refactor Binary Scalar Ops * Use alias * update nnvm version (#3290) * Fix breaking changes after pull master (#3291) * [CYTHON] Cython module for NDArray (#3292) * [NDARRAY] Cython module for ndarray * More strict tests * [NNVM] change of attr to set_attr (#3303) * Update run_test.sh * add nnvm cmake with windows (#3255) * [WIP] binary broadcast wip (#3301) * [WIP] binary broadcast wip [OPERATOR] Binary Broadcast ops fix lint lint fix max and min update submodule before removing reduce axis broad cast reduce ops * update * fix * fix warning * fix * x (#3308) * [IO] Python based ImageIter and Augumenter (#3227) * [IO] Python based ImageIter and Augumenter * fix * fix * fix * [OPT] NNVM Optimizer (#3314) * fix cpython in windows (#3309) * Add Mathematical functions (#3317) * fix image io * add hypot degrees radians cosh sinh tanh arcsinh arccosh arctanh (#3335) * add recent examples, collect some missing tutorials (#3340) * Improving docs & utilities for distributed training example. (#3341) * add init dict * disable SSE for arm hardware e.g. Raspberry Pi (#3346) * Add channel_ to Shape2D calculation (#3181) * Add channel_ to Shape2D calculation * scalapkg, add example multitask (#3186) * RNN cell demo with ptb LSTM language model (#3197) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * Bulk lint fix (#3211) * [TENSOR] Add FlatTo1D for all elementwise ops (#3238) * Fix little bug on context (#3202) * add PennTreeBank Language Model using lstm model in R (#2659) * Add function 'print_summary' and some revise (#3161) * Add function 'print_summary' and some revise Add function 'print_summary' for print detail information of network, and format argument was add in 'plot_network'. You can use 'print_summary' like: """ net = get_symbol(1000) shape = {'softmax_label': (64, 12), 'data': (64, 3, 224, 224)} mx.viz.print_summary(net, shape=shape) """ If without shape, the number of arguments would be nonsense currently. * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Added my CmakeLists.txt for caffe plugin, etc. * Revert "fix travis scala test config" (#3246) This reverts parts of commit 3e15f62. Reenables testing the Julia bindings * [Scala] Code generation for Symbol (#3217) [scala] auto-generate Symbol functions * fix spelling errors (#3258) Also align grammar and punctuation in short descriptions of features * fix typo in run_test.sh (#3260) * Copy slice along arbitrary axis (#3259) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * add copyslice along arbitrary axis for NDArray * copy_slice_to as an ndarray operator * Python interface to the _copy_slice_to operator * fix lint error * Enable concatenation for dim-1 vectors (#3264) * fix PReLU backward computing (#3277) * Add `reverse` option in Reshape (#3280) * add scala example, end2end neural-style (#3267) add scala example, end2end neural-style * Improve multi-GPU performance (#3241) * update kvstore * update model.py * bandwith tool * update readme * tiny * fix lint * fix batch size of dist_device_sync * fix * fix perf problem of kvstore when only using a single device * roll back to previous strategy how to choose update_on_kvsotre * add an optionl MXNET_ENABLE_GPU_P2P to control whether or not use p2p * update dmlccore (#3293) * Fix newer version of gtest and cpptest (#3294) * when set use_global_stats then do not use cudnn (#3289) * when set use_global_stats then do not use cudnn * fix batch norm with use_global_stats * Fix req+reserve_space in cudnn_rnn (#3274) Fix req Fix reserve_space Allocate reserve_space using Storage * add cudnn off option in Convolution (#3270) * add support for building on power (#3302) * add recent examples, collect some missing tutorials (#3340) * CMake for caffe plugin * Fix metric & im2rec.py * [Scala] Nnvm ops for NDArray & Symbol (#3361) * [scala] nnvm op support * [scala] remove unused codes * fix scala native code style * [R] Fix the R interface (#3334) * [R] Fix the R interface. remove man * Fix BN legacy issue * Locate compiled library on Windows (#3369) * Fix metric & im2rec.py (#3375) image io fix * Update legacy op FBackwardInGradIndex (#3376) * Update legacy op FBackwardInGradIndex * fix test * Fix for LRN Layer (#3366) * fixed cpu forward bug * added out_data[lrn_enum::kOut] as backward req. * removed lint * removed duplicate out_data[lrn_enum::kTmpNorm], * removed inplace option * add backward index * include some special functions (#3337) - gamma - gammaln - log1p - expm1 * fix kv build (#3385) * initial profiler branch based on dmlc/mxnet:nnvm * [profiler] add profiler & modify engine API * [profiler] add USE_PROFILER compile flag & modify code for changed engine api * [profiler] add c_api interface & modify graph_executor * [profiler] add python api * [profiler] typo & lint error * [profiler] reduce overhead & add PROFIELR_MESSAGE_FUNCNAME macro * [profiler] remove profiling argument from PushSync/PushAsync * [profiler] refactor profiler.h/.cc * [profiler] improve readability * [profiler] typo && add TODO comment * [profiler] fix ndarray op name & add WaitForVar back * [profiler] add example/profiler/profiler_ndarray.py * [profiler] fix memleak by using op->name * [profiler] fix lint * [profiler] fix lint
* NNVM Refactor (#3194) * Init nnvm change * temp checkin * Move TShape to NNVM * Redirect Symbolic API to NNVM * Add Op Prop Adapter * Finish migrate in shape infer * Pass all symbolic test * temp commit * enable aux data * [EXEC] Basic version of exec for forward only * [EXEC] Enable most optimizations, still wait grad and context * fix legacy op with latest one * Update NNVM NodeRef * Adapt to newer interface * ALl registry of backop is complete * temp commit * Hack finish backward pass * [EXEC] One day pass * [EXEC] Pass all operator unittest * [EXEC] enable model parallel * Fully pass all legacy tests * Remove legacy symbolic code * update news * Make travis compile * Fix python3 * Update viz module to new json format * [NNVM] Imperative Invoke (#3208) * [Engine] Deduplicate Variable Util * [NNVM] NNVM Imperative Invoke * [NNVM] Imperative improve speed * fix * fix * [scala] link libnnvm.a (#3214) * [PYTHON] Optional Cython Module for Symbols (#3242) * [CYTHON] Checkin cython enhancement * fix lint * [DOC] Move common doc to base * [EXEC] Support fcompute (#3249) * [EXEC] Support fcompute * Fix lint * fix lint * [OP] Add alias support (#3261) * Fix path in setup.py (#3276) * Fix path in setup.py * revert the nnvm version * [WIP] Element wise op refactor (#3245) * [OPERATOR] Refactor Unary Ops * [OPERATOR] Refactor Binary Scalar Ops * Use alias * update nnvm version (#3290) * Fix breaking changes after pull master (#3291) * [CYTHON] Cython module for NDArray (#3292) * [NDARRAY] Cython module for ndarray * More strict tests * [NNVM] change of attr to set_attr (#3303) * Update run_test.sh * add nnvm cmake with windows (#3255) * [WIP] binary broadcast wip (#3301) * [WIP] binary broadcast wip [OPERATOR] Binary Broadcast ops fix lint lint fix max and min update submodule before removing reduce axis broad cast reduce ops * update * fix * fix warning * fix * x (#3308) * [IO] Python based ImageIter and Augumenter (#3227) * [IO] Python based ImageIter and Augumenter * fix * fix * fix * [OPT] NNVM Optimizer (#3314) * fix cpython in windows (#3309) * Add Mathematical functions (#3317) * fix image io * add hypot degrees radians cosh sinh tanh arcsinh arccosh arctanh (#3335) * add recent examples, collect some missing tutorials (#3340) * Improving docs & utilities for distributed training example. (#3341) * add init dict * disable SSE for arm hardware e.g. Raspberry Pi (#3346) * Add channel_ to Shape2D calculation (#3181) * Add channel_ to Shape2D calculation * scalapkg, add example multitask (#3186) * RNN cell demo with ptb LSTM language model (#3197) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * Bulk lint fix (#3211) * [TENSOR] Add FlatTo1D for all elementwise ops (#3238) * Fix little bug on context (#3202) * add PennTreeBank Language Model using lstm model in R (#2659) * Add function 'print_summary' and some revise (#3161) * Add function 'print_summary' and some revise Add function 'print_summary' for print detail information of network, and format argument was add in 'plot_network'. You can use 'print_summary' like: """ net = get_symbol(1000) shape = {'softmax_label': (64, 12), 'data': (64, 3, 224, 224)} mx.viz.print_summary(net, shape=shape) """ If without shape, the number of arguments would be nonsense currently. * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Added my CmakeLists.txt for caffe plugin, etc. * Revert "fix travis scala test config" (#3246) This reverts parts of commit 3e15f62. Reenables testing the Julia bindings * [Scala] Code generation for Symbol (#3217) [scala] auto-generate Symbol functions * fix spelling errors (#3258) Also align grammar and punctuation in short descriptions of features * fix typo in run_test.sh (#3260) * Copy slice along arbitrary axis (#3259) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * add copyslice along arbitrary axis for NDArray * copy_slice_to as an ndarray operator * Python interface to the _copy_slice_to operator * fix lint error * Enable concatenation for dim-1 vectors (#3264) * fix PReLU backward computing (#3277) * Add `reverse` option in Reshape (#3280) * add scala example, end2end neural-style (#3267) add scala example, end2end neural-style * Improve multi-GPU performance (#3241) * update kvstore * update model.py * bandwith tool * update readme * tiny * fix lint * fix batch size of dist_device_sync * fix * fix perf problem of kvstore when only using a single device * roll back to previous strategy how to choose update_on_kvsotre * add an optionl MXNET_ENABLE_GPU_P2P to control whether or not use p2p * update dmlccore (#3293) * Fix newer version of gtest and cpptest (#3294) * when set use_global_stats then do not use cudnn (#3289) * when set use_global_stats then do not use cudnn * fix batch norm with use_global_stats * Fix req+reserve_space in cudnn_rnn (#3274) Fix req Fix reserve_space Allocate reserve_space using Storage * add cudnn off option in Convolution (#3270) * add support for building on power (#3302) * add recent examples, collect some missing tutorials (#3340) * CMake for caffe plugin * Fix metric & im2rec.py * [Scala] Nnvm ops for NDArray & Symbol (#3361) * [scala] nnvm op support * [scala] remove unused codes * fix scala native code style * [R] Fix the R interface (#3334) * [R] Fix the R interface. remove man * Fix BN legacy issue * Locate compiled library on Windows (#3369) * Fix metric & im2rec.py (#3375) image io fix * Update legacy op FBackwardInGradIndex (#3376) * Update legacy op FBackwardInGradIndex * fix test * Fix for LRN Layer (#3366) * fixed cpu forward bug * added out_data[lrn_enum::kOut] as backward req. * removed lint * removed duplicate out_data[lrn_enum::kTmpNorm], * removed inplace option * add backward index * include some special functions (#3337) - gamma - gammaln - log1p - expm1 * fix kv build (#3385) * initial profiler branch based on dmlc/mxnet:nnvm * [profiler] add profiler & modify engine API * [profiler] add USE_PROFILER compile flag & modify code for changed engine api * [profiler] add c_api interface & modify graph_executor * [profiler] add python api * [profiler] typo & lint error * [profiler] reduce overhead & add PROFIELR_MESSAGE_FUNCNAME macro * [profiler] remove profiling argument from PushSync/PushAsync * [profiler] refactor profiler.h/.cc * [profiler] improve readability * [profiler] typo && add TODO comment * [profiler] fix ndarray op name & add WaitForVar back * [profiler] add example/profiler/profiler_ndarray.py * [profiler] fix memleak by using op->name * [profiler] fix lint * [profiler] fix lint
* NNVM Refactor (#3194) * Init nnvm change * temp checkin * Move TShape to NNVM * Redirect Symbolic API to NNVM * Add Op Prop Adapter * Finish migrate in shape infer * Pass all symbolic test * temp commit * enable aux data * [EXEC] Basic version of exec for forward only * [EXEC] Enable most optimizations, still wait grad and context * fix legacy op with latest one * Update NNVM NodeRef * Adapt to newer interface * ALl registry of backop is complete * temp commit * Hack finish backward pass * [EXEC] One day pass * [EXEC] Pass all operator unittest * [EXEC] enable model parallel * Fully pass all legacy tests * Remove legacy symbolic code * update news * Make travis compile * Fix python3 * Update viz module to new json format * [NNVM] Imperative Invoke (#3208) * [Engine] Deduplicate Variable Util * [NNVM] NNVM Imperative Invoke * [NNVM] Imperative improve speed * fix * fix * [scala] link libnnvm.a (#3214) * [PYTHON] Optional Cython Module for Symbols (#3242) * [CYTHON] Checkin cython enhancement * fix lint * [DOC] Move common doc to base * [EXEC] Support fcompute (#3249) * [EXEC] Support fcompute * Fix lint * fix lint * [OP] Add alias support (#3261) * Fix path in setup.py (#3276) * Fix path in setup.py * revert the nnvm version * [WIP] Element wise op refactor (#3245) * [OPERATOR] Refactor Unary Ops * [OPERATOR] Refactor Binary Scalar Ops * Use alias * update nnvm version (#3290) * Fix breaking changes after pull master (#3291) * [CYTHON] Cython module for NDArray (#3292) * [NDARRAY] Cython module for ndarray * More strict tests * [NNVM] change of attr to set_attr (#3303) * Update run_test.sh * add nnvm cmake with windows (#3255) * [WIP] binary broadcast wip (#3301) * [WIP] binary broadcast wip [OPERATOR] Binary Broadcast ops fix lint lint fix max and min update submodule before removing reduce axis broad cast reduce ops * update * fix * fix warning * fix * x (#3308) * [IO] Python based ImageIter and Augumenter (#3227) * [IO] Python based ImageIter and Augumenter * fix * fix * fix * [OPT] NNVM Optimizer (#3314) * fix cpython in windows (#3309) * Add Mathematical functions (#3317) * fix image io * add hypot degrees radians cosh sinh tanh arcsinh arccosh arctanh (#3335) * add recent examples, collect some missing tutorials (#3340) * Improving docs & utilities for distributed training example. (#3341) * add init dict * disable SSE for arm hardware e.g. Raspberry Pi (#3346) * Add channel_ to Shape2D calculation (#3181) * Add channel_ to Shape2D calculation * scalapkg, add example multitask (#3186) * RNN cell demo with ptb LSTM language model (#3197) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * Bulk lint fix (#3211) * [TENSOR] Add FlatTo1D for all elementwise ops (#3238) * Fix little bug on context (#3202) * add PennTreeBank Language Model using lstm model in R (#2659) * Add function 'print_summary' and some revise (#3161) * Add function 'print_summary' and some revise Add function 'print_summary' for print detail information of network, and format argument was add in 'plot_network'. You can use 'print_summary' like: """ net = get_symbol(1000) shape = {'softmax_label': (64, 12), 'data': (64, 3, 224, 224)} mx.viz.print_summary(net, shape=shape) """ If without shape, the number of arguments would be nonsense currently. * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Added my CmakeLists.txt for caffe plugin, etc. * Revert "fix travis scala test config" (#3246) This reverts parts of commit 3e15f62. Reenables testing the Julia bindings * [Scala] Code generation for Symbol (#3217) [scala] auto-generate Symbol functions * fix spelling errors (#3258) Also align grammar and punctuation in short descriptions of features * fix typo in run_test.sh (#3260) * Copy slice along arbitrary axis (#3259) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * add copyslice along arbitrary axis for NDArray * copy_slice_to as an ndarray operator * Python interface to the _copy_slice_to operator * fix lint error * Enable concatenation for dim-1 vectors (#3264) * fix PReLU backward computing (#3277) * Add `reverse` option in Reshape (#3280) * add scala example, end2end neural-style (#3267) add scala example, end2end neural-style * Improve multi-GPU performance (#3241) * update kvstore * update model.py * bandwith tool * update readme * tiny * fix lint * fix batch size of dist_device_sync * fix * fix perf problem of kvstore when only using a single device * roll back to previous strategy how to choose update_on_kvsotre * add an optionl MXNET_ENABLE_GPU_P2P to control whether or not use p2p * update dmlccore (#3293) * Fix newer version of gtest and cpptest (#3294) * when set use_global_stats then do not use cudnn (#3289) * when set use_global_stats then do not use cudnn * fix batch norm with use_global_stats * Fix req+reserve_space in cudnn_rnn (#3274) Fix req Fix reserve_space Allocate reserve_space using Storage * add cudnn off option in Convolution (#3270) * add support for building on power (#3302) * add recent examples, collect some missing tutorials (#3340) * CMake for caffe plugin * Fix metric & im2rec.py * [Scala] Nnvm ops for NDArray & Symbol (#3361) * [scala] nnvm op support * [scala] remove unused codes * fix scala native code style * [R] Fix the R interface (#3334) * [R] Fix the R interface. remove man * Fix BN legacy issue * Locate compiled library on Windows (#3369) * Fix metric & im2rec.py (#3375) image io fix * Update legacy op FBackwardInGradIndex (#3376) * Update legacy op FBackwardInGradIndex * fix test * Fix for LRN Layer (#3366) * fixed cpu forward bug * added out_data[lrn_enum::kOut] as backward req. * removed lint * removed duplicate out_data[lrn_enum::kTmpNorm], * removed inplace option * add backward index * include some special functions (#3337) - gamma - gammaln - log1p - expm1 * fix kv build (#3385) * initial profiler branch based on dmlc/mxnet:nnvm * [profiler] add profiler & modify engine API * [profiler] add USE_PROFILER compile flag & modify code for changed engine api * [profiler] add c_api interface & modify graph_executor * [profiler] add python api * [profiler] typo & lint error * [profiler] reduce overhead & add PROFIELR_MESSAGE_FUNCNAME macro * [profiler] remove profiling argument from PushSync/PushAsync * [profiler] refactor profiler.h/.cc * [profiler] improve readability * [profiler] typo && add TODO comment * [profiler] fix ndarray op name & add WaitForVar back * [profiler] add example/profiler/profiler_ndarray.py * [profiler] fix memleak by using op->name * [profiler] fix lint * [profiler] fix lint
* NNVM Refactor (apache#3194) * Init nnvm change * temp checkin * Move TShape to NNVM * Redirect Symbolic API to NNVM * Add Op Prop Adapter * Finish migrate in shape infer * Pass all symbolic test * temp commit * enable aux data * [EXEC] Basic version of exec for forward only * [EXEC] Enable most optimizations, still wait grad and context * fix legacy op with latest one * Update NNVM NodeRef * Adapt to newer interface * ALl registry of backop is complete * temp commit * Hack finish backward pass * [EXEC] One day pass * [EXEC] Pass all operator unittest * [EXEC] enable model parallel * Fully pass all legacy tests * Remove legacy symbolic code * update news * Make travis compile * Fix python3 * Update viz module to new json format * [NNVM] Imperative Invoke (apache#3208) * [Engine] Deduplicate Variable Util * [NNVM] NNVM Imperative Invoke * [NNVM] Imperative improve speed * fix * fix * [scala] link libnnvm.a (apache#3214) * [PYTHON] Optional Cython Module for Symbols (apache#3242) * [CYTHON] Checkin cython enhancement * fix lint * [DOC] Move common doc to base * [EXEC] Support fcompute (apache#3249) * [EXEC] Support fcompute * Fix lint * fix lint * [OP] Add alias support (apache#3261) * Fix path in setup.py (apache#3276) * Fix path in setup.py * revert the nnvm version * [WIP] Element wise op refactor (apache#3245) * [OPERATOR] Refactor Unary Ops * [OPERATOR] Refactor Binary Scalar Ops * Use alias * update nnvm version (apache#3290) * Fix breaking changes after pull master (apache#3291) * [CYTHON] Cython module for NDArray (apache#3292) * [NDARRAY] Cython module for ndarray * More strict tests * [NNVM] change of attr to set_attr (apache#3303) * Update run_test.sh * add nnvm cmake with windows (apache#3255) * [WIP] binary broadcast wip (apache#3301) * [WIP] binary broadcast wip [OPERATOR] Binary Broadcast ops fix lint lint fix max and min update submodule before removing reduce axis broad cast reduce ops * update * fix * fix warning * fix * x (apache#3308) * [IO] Python based ImageIter and Augumenter (apache#3227) * [IO] Python based ImageIter and Augumenter * fix * fix * fix * [OPT] NNVM Optimizer (apache#3314) * fix cpython in windows (apache#3309) * Add Mathematical functions (apache#3317) * fix image io * add hypot degrees radians cosh sinh tanh arcsinh arccosh arctanh (apache#3335) * add recent examples, collect some missing tutorials (apache#3340) * Improving docs & utilities for distributed training example. (apache#3341) * add init dict * disable SSE for arm hardware e.g. Raspberry Pi (apache#3346) * Add channel_ to Shape2D calculation (apache#3181) * Add channel_ to Shape2D calculation * scalapkg, add example multitask (apache#3186) * RNN cell demo with ptb LSTM language model (apache#3197) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * Bulk lint fix (apache#3211) * [TENSOR] Add FlatTo1D for all elementwise ops (apache#3238) * Fix little bug on context (apache#3202) * add PennTreeBank Language Model using lstm model in R (apache#2659) * Add function 'print_summary' and some revise (apache#3161) * Add function 'print_summary' and some revise Add function 'print_summary' for print detail information of network, and format argument was add in 'plot_network'. You can use 'print_summary' like: """ net = get_symbol(1000) shape = {'softmax_label': (64, 12), 'data': (64, 3, 224, 224)} mx.viz.print_summary(net, shape=shape) """ If without shape, the number of arguments would be nonsense currently. * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Added my CmakeLists.txt for caffe plugin, etc. * Revert "fix travis scala test config" (apache#3246) This reverts parts of commit 3e15f62. Reenables testing the Julia bindings * [Scala] Code generation for Symbol (apache#3217) [scala] auto-generate Symbol functions * fix spelling errors (apache#3258) Also align grammar and punctuation in short descriptions of features * fix typo in run_test.sh (apache#3260) * Copy slice along arbitrary axis (apache#3259) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * add copyslice along arbitrary axis for NDArray * copy_slice_to as an ndarray operator * Python interface to the _copy_slice_to operator * fix lint error * Enable concatenation for dim-1 vectors (apache#3264) * fix PReLU backward computing (apache#3277) * Add `reverse` option in Reshape (apache#3280) * add scala example, end2end neural-style (apache#3267) add scala example, end2end neural-style * Improve multi-GPU performance (apache#3241) * update kvstore * update model.py * bandwith tool * update readme * tiny * fix lint * fix batch size of dist_device_sync * fix * fix perf problem of kvstore when only using a single device * roll back to previous strategy how to choose update_on_kvsotre * add an optionl MXNET_ENABLE_GPU_P2P to control whether or not use p2p * update dmlccore (apache#3293) * Fix newer version of gtest and cpptest (apache#3294) * when set use_global_stats then do not use cudnn (apache#3289) * when set use_global_stats then do not use cudnn * fix batch norm with use_global_stats * Fix req+reserve_space in cudnn_rnn (apache#3274) Fix req Fix reserve_space Allocate reserve_space using Storage * add cudnn off option in Convolution (apache#3270) * add support for building on power (apache#3302) * add recent examples, collect some missing tutorials (apache#3340) * CMake for caffe plugin * Fix metric & im2rec.py * [Scala] Nnvm ops for NDArray & Symbol (apache#3361) * [scala] nnvm op support * [scala] remove unused codes * fix scala native code style * [R] Fix the R interface (apache#3334) * [R] Fix the R interface. remove man * Fix BN legacy issue * Locate compiled library on Windows (apache#3369) * Fix metric & im2rec.py (apache#3375) image io fix * Update legacy op FBackwardInGradIndex (apache#3376) * Update legacy op FBackwardInGradIndex * fix test * Fix for LRN Layer (apache#3366) * fixed cpu forward bug * added out_data[lrn_enum::kOut] as backward req. * removed lint * removed duplicate out_data[lrn_enum::kTmpNorm], * removed inplace option * add backward index * include some special functions (apache#3337) - gamma - gammaln - log1p - expm1 * fix kv build (apache#3385) * initial profiler branch based on dmlc/mxnet:nnvm * [profiler] add profiler & modify engine API * [profiler] add USE_PROFILER compile flag & modify code for changed engine api * [profiler] add c_api interface & modify graph_executor * [profiler] add python api * [profiler] typo & lint error * [profiler] reduce overhead & add PROFIELR_MESSAGE_FUNCNAME macro * [profiler] remove profiling argument from PushSync/PushAsync * [profiler] refactor profiler.h/.cc * [profiler] improve readability * [profiler] typo && add TODO comment * [profiler] fix ndarray op name & add WaitForVar back * [profiler] add example/profiler/profiler_ndarray.py * [profiler] fix memleak by using op->name * [profiler] fix lint * [profiler] fix lint
* NNVM Refactor (apache#3194) * Init nnvm change * temp checkin * Move TShape to NNVM * Redirect Symbolic API to NNVM * Add Op Prop Adapter * Finish migrate in shape infer * Pass all symbolic test * temp commit * enable aux data * [EXEC] Basic version of exec for forward only * [EXEC] Enable most optimizations, still wait grad and context * fix legacy op with latest one * Update NNVM NodeRef * Adapt to newer interface * ALl registry of backop is complete * temp commit * Hack finish backward pass * [EXEC] One day pass * [EXEC] Pass all operator unittest * [EXEC] enable model parallel * Fully pass all legacy tests * Remove legacy symbolic code * update news * Make travis compile * Fix python3 * Update viz module to new json format * [NNVM] Imperative Invoke (apache#3208) * [Engine] Deduplicate Variable Util * [NNVM] NNVM Imperative Invoke * [NNVM] Imperative improve speed * fix * fix * [scala] link libnnvm.a (apache#3214) * [PYTHON] Optional Cython Module for Symbols (apache#3242) * [CYTHON] Checkin cython enhancement * fix lint * [DOC] Move common doc to base * [EXEC] Support fcompute (apache#3249) * [EXEC] Support fcompute * Fix lint * fix lint * [OP] Add alias support (apache#3261) * Fix path in setup.py (apache#3276) * Fix path in setup.py * revert the nnvm version * [WIP] Element wise op refactor (apache#3245) * [OPERATOR] Refactor Unary Ops * [OPERATOR] Refactor Binary Scalar Ops * Use alias * update nnvm version (apache#3290) * Fix breaking changes after pull master (apache#3291) * [CYTHON] Cython module for NDArray (apache#3292) * [NDARRAY] Cython module for ndarray * More strict tests * [NNVM] change of attr to set_attr (apache#3303) * Update run_test.sh * add nnvm cmake with windows (apache#3255) * [WIP] binary broadcast wip (apache#3301) * [WIP] binary broadcast wip [OPERATOR] Binary Broadcast ops fix lint lint fix max and min update submodule before removing reduce axis broad cast reduce ops * update * fix * fix warning * fix * x (apache#3308) * [IO] Python based ImageIter and Augumenter (apache#3227) * [IO] Python based ImageIter and Augumenter * fix * fix * fix * [OPT] NNVM Optimizer (apache#3314) * fix cpython in windows (apache#3309) * Add Mathematical functions (apache#3317) * fix image io * add hypot degrees radians cosh sinh tanh arcsinh arccosh arctanh (apache#3335) * add recent examples, collect some missing tutorials (apache#3340) * Improving docs & utilities for distributed training example. (apache#3341) * add init dict * disable SSE for arm hardware e.g. Raspberry Pi (apache#3346) * Add channel_ to Shape2D calculation (apache#3181) * Add channel_ to Shape2D calculation * scalapkg, add example multitask (apache#3186) * RNN cell demo with ptb LSTM language model (apache#3197) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * Bulk lint fix (apache#3211) * [TENSOR] Add FlatTo1D for all elementwise ops (apache#3238) * Fix little bug on context (apache#3202) * add PennTreeBank Language Model using lstm model in R (apache#2659) * Add function 'print_summary' and some revise (apache#3161) * Add function 'print_summary' and some revise Add function 'print_summary' for print detail information of network, and format argument was add in 'plot_network'. You can use 'print_summary' like: """ net = get_symbol(1000) shape = {'softmax_label': (64, 12), 'data': (64, 3, 224, 224)} mx.viz.print_summary(net, shape=shape) """ If without shape, the number of arguments would be nonsense currently. * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Added my CmakeLists.txt for caffe plugin, etc. * Revert "fix travis scala test config" (apache#3246) This reverts parts of commit 3e15f62. Reenables testing the Julia bindings * [Scala] Code generation for Symbol (apache#3217) [scala] auto-generate Symbol functions * fix spelling errors (apache#3258) Also align grammar and punctuation in short descriptions of features * fix typo in run_test.sh (apache#3260) * Copy slice along arbitrary axis (apache#3259) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * add copyslice along arbitrary axis for NDArray * copy_slice_to as an ndarray operator * Python interface to the _copy_slice_to operator * fix lint error * Enable concatenation for dim-1 vectors (apache#3264) * fix PReLU backward computing (apache#3277) * Add `reverse` option in Reshape (apache#3280) * add scala example, end2end neural-style (apache#3267) add scala example, end2end neural-style * Improve multi-GPU performance (apache#3241) * update kvstore * update model.py * bandwith tool * update readme * tiny * fix lint * fix batch size of dist_device_sync * fix * fix perf problem of kvstore when only using a single device * roll back to previous strategy how to choose update_on_kvsotre * add an optionl MXNET_ENABLE_GPU_P2P to control whether or not use p2p * update dmlccore (apache#3293) * Fix newer version of gtest and cpptest (apache#3294) * when set use_global_stats then do not use cudnn (apache#3289) * when set use_global_stats then do not use cudnn * fix batch norm with use_global_stats * Fix req+reserve_space in cudnn_rnn (apache#3274) Fix req Fix reserve_space Allocate reserve_space using Storage * add cudnn off option in Convolution (apache#3270) * add support for building on power (apache#3302) * add recent examples, collect some missing tutorials (apache#3340) * CMake for caffe plugin * Fix metric & im2rec.py * [Scala] Nnvm ops for NDArray & Symbol (apache#3361) * [scala] nnvm op support * [scala] remove unused codes * fix scala native code style * [R] Fix the R interface (apache#3334) * [R] Fix the R interface. remove man * Fix BN legacy issue * Locate compiled library on Windows (apache#3369) * Fix metric & im2rec.py (apache#3375) image io fix * Update legacy op FBackwardInGradIndex (apache#3376) * Update legacy op FBackwardInGradIndex * fix test * Fix for LRN Layer (apache#3366) * fixed cpu forward bug * added out_data[lrn_enum::kOut] as backward req. * removed lint * removed duplicate out_data[lrn_enum::kTmpNorm], * removed inplace option * add backward index * include some special functions (apache#3337) - gamma - gammaln - log1p - expm1 * fix kv build (apache#3385) * initial profiler branch based on dmlc/mxnet:nnvm * [profiler] add profiler & modify engine API * [profiler] add USE_PROFILER compile flag & modify code for changed engine api * [profiler] add c_api interface & modify graph_executor * [profiler] add python api * [profiler] typo & lint error * [profiler] reduce overhead & add PROFIELR_MESSAGE_FUNCNAME macro * [profiler] remove profiling argument from PushSync/PushAsync * [profiler] refactor profiler.h/.cc * [profiler] improve readability * [profiler] typo && add TODO comment * [profiler] fix ndarray op name & add WaitForVar back * [profiler] add example/profiler/profiler_ndarray.py * [profiler] fix memleak by using op->name * [profiler] fix lint * [profiler] fix lint
* NNVM Refactor (apache#3194) * Init nnvm change * temp checkin * Move TShape to NNVM * Redirect Symbolic API to NNVM * Add Op Prop Adapter * Finish migrate in shape infer * Pass all symbolic test * temp commit * enable aux data * [EXEC] Basic version of exec for forward only * [EXEC] Enable most optimizations, still wait grad and context * fix legacy op with latest one * Update NNVM NodeRef * Adapt to newer interface * ALl registry of backop is complete * temp commit * Hack finish backward pass * [EXEC] One day pass * [EXEC] Pass all operator unittest * [EXEC] enable model parallel * Fully pass all legacy tests * Remove legacy symbolic code * update news * Make travis compile * Fix python3 * Update viz module to new json format * [NNVM] Imperative Invoke (apache#3208) * [Engine] Deduplicate Variable Util * [NNVM] NNVM Imperative Invoke * [NNVM] Imperative improve speed * fix * fix * [scala] link libnnvm.a (apache#3214) * [PYTHON] Optional Cython Module for Symbols (apache#3242) * [CYTHON] Checkin cython enhancement * fix lint * [DOC] Move common doc to base * [EXEC] Support fcompute (apache#3249) * [EXEC] Support fcompute * Fix lint * fix lint * [OP] Add alias support (apache#3261) * Fix path in setup.py (apache#3276) * Fix path in setup.py * revert the nnvm version * [WIP] Element wise op refactor (apache#3245) * [OPERATOR] Refactor Unary Ops * [OPERATOR] Refactor Binary Scalar Ops * Use alias * update nnvm version (apache#3290) * Fix breaking changes after pull master (apache#3291) * [CYTHON] Cython module for NDArray (apache#3292) * [NDARRAY] Cython module for ndarray * More strict tests * [NNVM] change of attr to set_attr (apache#3303) * Update run_test.sh * add nnvm cmake with windows (apache#3255) * [WIP] binary broadcast wip (apache#3301) * [WIP] binary broadcast wip [OPERATOR] Binary Broadcast ops fix lint lint fix max and min update submodule before removing reduce axis broad cast reduce ops * update * fix * fix warning * fix * x (apache#3308) * [IO] Python based ImageIter and Augumenter (apache#3227) * [IO] Python based ImageIter and Augumenter * fix * fix * fix * [OPT] NNVM Optimizer (apache#3314) * fix cpython in windows (apache#3309) * Add Mathematical functions (apache#3317) * fix image io * add hypot degrees radians cosh sinh tanh arcsinh arccosh arctanh (apache#3335) * add recent examples, collect some missing tutorials (apache#3340) * Improving docs & utilities for distributed training example. (apache#3341) * add init dict * disable SSE for arm hardware e.g. Raspberry Pi (apache#3346) * Add channel_ to Shape2D calculation (apache#3181) * Add channel_ to Shape2D calculation * scalapkg, add example multitask (apache#3186) * RNN cell demo with ptb LSTM language model (apache#3197) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * Bulk lint fix (apache#3211) * [TENSOR] Add FlatTo1D for all elementwise ops (apache#3238) * Fix little bug on context (apache#3202) * add PennTreeBank Language Model using lstm model in R (apache#2659) * Add function 'print_summary' and some revise (apache#3161) * Add function 'print_summary' and some revise Add function 'print_summary' for print detail information of network, and format argument was add in 'plot_network'. You can use 'print_summary' like: """ net = get_symbol(1000) shape = {'softmax_label': (64, 12), 'data': (64, 3, 224, 224)} mx.viz.print_summary(net, shape=shape) """ If without shape, the number of arguments would be nonsense currently. * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Added my CmakeLists.txt for caffe plugin, etc. * Revert "fix travis scala test config" (apache#3246) This reverts parts of commit 3e15f62. Reenables testing the Julia bindings * [Scala] Code generation for Symbol (apache#3217) [scala] auto-generate Symbol functions * fix spelling errors (apache#3258) Also align grammar and punctuation in short descriptions of features * fix typo in run_test.sh (apache#3260) * Copy slice along arbitrary axis (apache#3259) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * add copyslice along arbitrary axis for NDArray * copy_slice_to as an ndarray operator * Python interface to the _copy_slice_to operator * fix lint error * Enable concatenation for dim-1 vectors (apache#3264) * fix PReLU backward computing (apache#3277) * Add `reverse` option in Reshape (apache#3280) * add scala example, end2end neural-style (apache#3267) add scala example, end2end neural-style * Improve multi-GPU performance (apache#3241) * update kvstore * update model.py * bandwith tool * update readme * tiny * fix lint * fix batch size of dist_device_sync * fix * fix perf problem of kvstore when only using a single device * roll back to previous strategy how to choose update_on_kvsotre * add an optionl MXNET_ENABLE_GPU_P2P to control whether or not use p2p * update dmlccore (apache#3293) * Fix newer version of gtest and cpptest (apache#3294) * when set use_global_stats then do not use cudnn (apache#3289) * when set use_global_stats then do not use cudnn * fix batch norm with use_global_stats * Fix req+reserve_space in cudnn_rnn (apache#3274) Fix req Fix reserve_space Allocate reserve_space using Storage * add cudnn off option in Convolution (apache#3270) * add support for building on power (apache#3302) * add recent examples, collect some missing tutorials (apache#3340) * CMake for caffe plugin * Fix metric & im2rec.py * [Scala] Nnvm ops for NDArray & Symbol (apache#3361) * [scala] nnvm op support * [scala] remove unused codes * fix scala native code style * [R] Fix the R interface (apache#3334) * [R] Fix the R interface. remove man * Fix BN legacy issue * Locate compiled library on Windows (apache#3369) * Fix metric & im2rec.py (apache#3375) image io fix * Update legacy op FBackwardInGradIndex (apache#3376) * Update legacy op FBackwardInGradIndex * fix test * Fix for LRN Layer (apache#3366) * fixed cpu forward bug * added out_data[lrn_enum::kOut] as backward req. * removed lint * removed duplicate out_data[lrn_enum::kTmpNorm], * removed inplace option * add backward index * include some special functions (apache#3337) - gamma - gammaln - log1p - expm1 * fix kv build (apache#3385) * initial profiler branch based on dmlc/mxnet:nnvm * [profiler] add profiler & modify engine API * [profiler] add USE_PROFILER compile flag & modify code for changed engine api * [profiler] add c_api interface & modify graph_executor * [profiler] add python api * [profiler] typo & lint error * [profiler] reduce overhead & add PROFIELR_MESSAGE_FUNCNAME macro * [profiler] remove profiling argument from PushSync/PushAsync * [profiler] refactor profiler.h/.cc * [profiler] improve readability * [profiler] typo && add TODO comment * [profiler] fix ndarray op name & add WaitForVar back * [profiler] add example/profiler/profiler_ndarray.py * [profiler] fix memleak by using op->name * [profiler] fix lint * [profiler] fix lint
* NNVM Refactor (#3194) * Init nnvm change * temp checkin * Move TShape to NNVM * Redirect Symbolic API to NNVM * Add Op Prop Adapter * Finish migrate in shape infer * Pass all symbolic test * temp commit * enable aux data * [EXEC] Basic version of exec for forward only * [EXEC] Enable most optimizations, still wait grad and context * fix legacy op with latest one * Update NNVM NodeRef * Adapt to newer interface * ALl registry of backop is complete * temp commit * Hack finish backward pass * [EXEC] One day pass * [EXEC] Pass all operator unittest * [EXEC] enable model parallel * Fully pass all legacy tests * Remove legacy symbolic code * update news * Make travis compile * Fix python3 * Update viz module to new json format * [NNVM] Imperative Invoke (#3208) * [Engine] Deduplicate Variable Util * [NNVM] NNVM Imperative Invoke * [NNVM] Imperative improve speed * fix * fix * [scala] link libnnvm.a (#3214) * [PYTHON] Optional Cython Module for Symbols (#3242) * [CYTHON] Checkin cython enhancement * fix lint * [DOC] Move common doc to base * [EXEC] Support fcompute (#3249) * [EXEC] Support fcompute * Fix lint * fix lint * [OP] Add alias support (#3261) * Fix path in setup.py (#3276) * Fix path in setup.py * revert the nnvm version * [WIP] Element wise op refactor (#3245) * [OPERATOR] Refactor Unary Ops * [OPERATOR] Refactor Binary Scalar Ops * Use alias * update nnvm version (#3290) * Fix breaking changes after pull master (#3291) * [CYTHON] Cython module for NDArray (#3292) * [NDARRAY] Cython module for ndarray * More strict tests * [NNVM] change of attr to set_attr (#3303) * Update run_test.sh * add nnvm cmake with windows (#3255) * [WIP] binary broadcast wip (#3301) * [WIP] binary broadcast wip [OPERATOR] Binary Broadcast ops fix lint lint fix max and min update submodule before removing reduce axis broad cast reduce ops * update * fix * fix warning * fix * x (#3308) * [IO] Python based ImageIter and Augumenter (#3227) * [IO] Python based ImageIter and Augumenter * fix * fix * fix * [OPT] NNVM Optimizer (#3314) * fix cpython in windows (#3309) * Add Mathematical functions (#3317) * fix image io * add hypot degrees radians cosh sinh tanh arcsinh arccosh arctanh (#3335) * add recent examples, collect some missing tutorials (#3340) * Improving docs & utilities for distributed training example. (#3341) * add init dict * disable SSE for arm hardware e.g. Raspberry Pi (#3346) * Add channel_ to Shape2D calculation (#3181) * Add channel_ to Shape2D calculation * scalapkg, add example multitask (#3186) * RNN cell demo with ptb LSTM language model (#3197) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * Bulk lint fix (#3211) * [TENSOR] Add FlatTo1D for all elementwise ops (#3238) * Fix little bug on context (#3202) * add PennTreeBank Language Model using lstm model in R (#2659) * Add function 'print_summary' and some revise (#3161) * Add function 'print_summary' and some revise Add function 'print_summary' for print detail information of network, and format argument was add in 'plot_network'. You can use 'print_summary' like: """ net = get_symbol(1000) shape = {'softmax_label': (64, 12), 'data': (64, 3, 224, 224)} mx.viz.print_summary(net, shape=shape) """ If without shape, the number of arguments would be nonsense currently. * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Added my CmakeLists.txt for caffe plugin, etc. * Revert "fix travis scala test config" (#3246) This reverts parts of commit 3e15f62. Reenables testing the Julia bindings * [Scala] Code generation for Symbol (#3217) [scala] auto-generate Symbol functions * fix spelling errors (#3258) Also align grammar and punctuation in short descriptions of features * fix typo in run_test.sh (#3260) * Copy slice along arbitrary axis (#3259) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * add copyslice along arbitrary axis for NDArray * copy_slice_to as an ndarray operator * Python interface to the _copy_slice_to operator * fix lint error * Enable concatenation for dim-1 vectors (#3264) * fix PReLU backward computing (#3277) * Add `reverse` option in Reshape (#3280) * add scala example, end2end neural-style (#3267) add scala example, end2end neural-style * Improve multi-GPU performance (#3241) * update kvstore * update model.py * bandwith tool * update readme * tiny * fix lint * fix batch size of dist_device_sync * fix * fix perf problem of kvstore when only using a single device * roll back to previous strategy how to choose update_on_kvsotre * add an optionl MXNET_ENABLE_GPU_P2P to control whether or not use p2p * update dmlccore (#3293) * Fix newer version of gtest and cpptest (#3294) * when set use_global_stats then do not use cudnn (#3289) * when set use_global_stats then do not use cudnn * fix batch norm with use_global_stats * Fix req+reserve_space in cudnn_rnn (#3274) Fix req Fix reserve_space Allocate reserve_space using Storage * add cudnn off option in Convolution (#3270) * add support for building on power (#3302) * add recent examples, collect some missing tutorials (#3340) * CMake for caffe plugin * Fix metric & im2rec.py * [Scala] Nnvm ops for NDArray & Symbol (#3361) * [scala] nnvm op support * [scala] remove unused codes * fix scala native code style * [R] Fix the R interface (#3334) * [R] Fix the R interface. remove man * Fix BN legacy issue * Locate compiled library on Windows (#3369) * Fix metric & im2rec.py (#3375) image io fix * Update legacy op FBackwardInGradIndex (#3376) * Update legacy op FBackwardInGradIndex * fix test * Fix for LRN Layer (#3366) * fixed cpu forward bug * added out_data[lrn_enum::kOut] as backward req. * removed lint * removed duplicate out_data[lrn_enum::kTmpNorm], * removed inplace option * add backward index * include some special functions (#3337) - gamma - gammaln - log1p - expm1 * fix kv build (#3385) * initial profiler branch based on dmlc/mxnet:nnvm * [profiler] add profiler & modify engine API * [profiler] add USE_PROFILER compile flag & modify code for changed engine api * [profiler] add c_api interface & modify graph_executor * [profiler] add python api * [profiler] typo & lint error * [profiler] reduce overhead & add PROFIELR_MESSAGE_FUNCNAME macro * [profiler] remove profiling argument from PushSync/PushAsync * [profiler] refactor profiler.h/.cc * [profiler] improve readability * [profiler] typo && add TODO comment * [profiler] fix ndarray op name & add WaitForVar back * [profiler] add example/profiler/profiler_ndarray.py * [profiler] fix memleak by using op->name * [profiler] fix lint * [profiler] fix lint
#4641) * Add channel_ to Shape2D calculation * scalapkg, add example multitask (#3186) * RNN cell demo with ptb LSTM language model (#3197) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * Bulk lint fix (#3211) * [TENSOR] Add FlatTo1D for all elementwise ops (#3238) * Fix little bug on context (#3202) * add PennTreeBank Language Model using lstm model in R (#2659) * Add function 'print_summary' and some revise (#3161) * Add function 'print_summary' and some revise Add function 'print_summary' for print detail information of network, and format argument was add in 'plot_network'. You can use 'print_summary' like: """ net = get_symbol(1000) shape = {'softmax_label': (64, 12), 'data': (64, 3, 224, 224)} mx.viz.print_summary(net, shape=shape) """ If without shape, the number of arguments would be nonsense currently. * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Added my CmakeLists.txt for caffe plugin, etc. * Revert "fix travis scala test config" (#3246) This reverts parts of commit 3e15f62. Reenables testing the Julia bindings * [Scala] Code generation for Symbol (#3217) [scala] auto-generate Symbol functions * fix spelling errors (#3258) Also align grammar and punctuation in short descriptions of features * fix typo in run_test.sh (#3260) * Copy slice along arbitrary axis (#3259) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * add copyslice along arbitrary axis for NDArray * copy_slice_to as an ndarray operator * Python interface to the _copy_slice_to operator * fix lint error * Enable concatenation for dim-1 vectors (#3264) * fix PReLU backward computing (#3277) * Add `reverse` option in Reshape (#3280) * add scala example, end2end neural-style (#3267) add scala example, end2end neural-style * Improve multi-GPU performance (#3241) * update kvstore * update model.py * bandwith tool * update readme * tiny * fix lint * fix batch size of dist_device_sync * fix * fix perf problem of kvstore when only using a single device * roll back to previous strategy how to choose update_on_kvsotre * add an optionl MXNET_ENABLE_GPU_P2P to control whether or not use p2p * update dmlccore (#3293) * Fix newer version of gtest and cpptest (#3294) * when set use_global_stats then do not use cudnn (#3289) * when set use_global_stats then do not use cudnn * fix batch norm with use_global_stats * Fix req+reserve_space in cudnn_rnn (#3274) Fix req Fix reserve_space Allocate reserve_space using Storage * add cudnn off option in Convolution (#3270) * add support for building on power (#3302) * add recent examples, collect some missing tutorials (#3340) * CMake for caffe plugin * CMake python deployment changes * CMake python deployment changes * CMake python deployment changes * CMake python deployment changes
apache#4641) * Add channel_ to Shape2D calculation * scalapkg, add example multitask (apache#3186) * RNN cell demo with ptb LSTM language model (apache#3197) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * Bulk lint fix (apache#3211) * [TENSOR] Add FlatTo1D for all elementwise ops (apache#3238) * Fix little bug on context (apache#3202) * add PennTreeBank Language Model using lstm model in R (apache#2659) * Add function 'print_summary' and some revise (apache#3161) * Add function 'print_summary' and some revise Add function 'print_summary' for print detail information of network, and format argument was add in 'plot_network'. You can use 'print_summary' like: """ net = get_symbol(1000) shape = {'softmax_label': (64, 12), 'data': (64, 3, 224, 224)} mx.viz.print_summary(net, shape=shape) """ If without shape, the number of arguments would be nonsense currently. * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Update visualization.py * Added my CmakeLists.txt for caffe plugin, etc. * Revert "fix travis scala test config" (apache#3246) This reverts parts of commit 3e15f62. Reenables testing the Julia bindings * [Scala] Code generation for Symbol (apache#3217) [scala] auto-generate Symbol functions * fix spelling errors (apache#3258) Also align grammar and punctuation in short descriptions of features * fix typo in run_test.sh (apache#3260) * Copy slice along arbitrary axis (apache#3259) * rnn-cell demo (push to server for testing) * a running example with cuDNN RNN cell * add copyslice along arbitrary axis for NDArray * copy_slice_to as an ndarray operator * Python interface to the _copy_slice_to operator * fix lint error * Enable concatenation for dim-1 vectors (apache#3264) * fix PReLU backward computing (apache#3277) * Add `reverse` option in Reshape (apache#3280) * add scala example, end2end neural-style (apache#3267) add scala example, end2end neural-style * Improve multi-GPU performance (apache#3241) * update kvstore * update model.py * bandwith tool * update readme * tiny * fix lint * fix batch size of dist_device_sync * fix * fix perf problem of kvstore when only using a single device * roll back to previous strategy how to choose update_on_kvsotre * add an optionl MXNET_ENABLE_GPU_P2P to control whether or not use p2p * update dmlccore (apache#3293) * Fix newer version of gtest and cpptest (apache#3294) * when set use_global_stats then do not use cudnn (apache#3289) * when set use_global_stats then do not use cudnn * fix batch norm with use_global_stats * Fix req+reserve_space in cudnn_rnn (apache#3274) Fix req Fix reserve_space Allocate reserve_space using Storage * add cudnn off option in Convolution (apache#3270) * add support for building on power (apache#3302) * add recent examples, collect some missing tutorials (apache#3340) * CMake for caffe plugin * CMake python deployment changes * CMake python deployment changes * CMake python deployment changes * CMake python deployment changes
This is a major refactor of src/kvstore, we should obtain multiple confirmations from users before merging.
performance improvement
will try to use gpu pear-to-pear communication if available for
kvstore=device
. together with PR #3238 , it potentially improves the performance using >=4 gpus training, or training multiple jobs at the same time. for example, using 8 m40, we can improve resnet 152 layers from 300 img/sec to 353 img/sec. even larger improvement is from distributed training.we also provide tools in
tools/bandwidth
to measure the GPU bandwidth for various neural networks and hardwares.changes for the interface
dist_sync_device
anddist_async_device
for using GPU p2p communication in distributed training."device" in name
, then will use GPU p2p for push and pull. otherwise all data go to cpu memory directly."dist" in name
, use distributed kvstore.kvstore=device
. previous we will copy data to cpu first. so the new way may accelerate things, but require the optimizer to be able to run GPUs4. the function_create_kvstore
inmodel.py
always returnsupdate_on_kvstore=True
to simplify things. we can further remove all logic related toupdate_on_kvstore
.further things
we can use unified memory to solve #2919 we only need to
but i observed decreased performance on using unified memory. so we probably only enable it for very large arrays, e.g. fullc weight in vgg.