- Added limited cpu support for two sparse formats for
Symbol
andNDArray
-CSRNDArray
andRowSparseNDArray
- Added a sparse dot product operator and many element-wise sparse operators
- Added a data iterator for sparse data input -
LibSVMIter
- Added three optimizers for sparse gradient updates:
Ftrl
,SGD
andAdam
- Added
push
androw_sparse_pull
withRowSparseNDArray
in distributed kvstore
- New loss functions added -
SigmoidBinaryCrossEntropyLoss
,CTCLoss
,HuberLoss
,HingeLoss
,SquaredHingeLoss
,LogisticLoss
,TripletLoss
gluon.Trainer
now allows reading and setting learning rate withtrainer.learning_rate
property.- Added
mx.autograd.grad
and experimental second order gradient support (though most operators don't support second order gradient yet) - Added
ConvLSTM
etc togluon.contrib
- Autograd now supports cross-device graphs. Use
x.copyto(mx.gpu(i))
andx.copyto(mx.cpu())
to do computation on multiple devices.
- Limited support for fancy indexing. x[idx_arr0, idx_arr1, ..., idx_arrn] is now supported. Full support coming soon in next release. Checkout master to get a preview.
- Random number generators in
mx.nd.random.*
andmx.sym.random.*
now supports both CPU and GPU NDArray
andSymbol
now supports "fluent" methods. You can now usex.exp()
etc instead ofmx.nd.exp(x)
ormx.sym.exp(x)
- Added
mx.rtc.CudaModule
for writing and running CUDA kernels from python - Added
multi_precision
option to optimizer for easier float16 training
- Enabled JIT compilation. Autograd and Gluon hybridize now use less memory and has faster speed. Performance is almost the same with old symbolic style code.
- Full support for NVidia Volta GPU Architecture and Cuda 9. Training is up to 3.5x faster than Pascal when using float16.
- Operators like
mx.sym.linalg_*
andmx.sym.random_*
are now moved tomx.sym.linalg.*
andmx.sym.random.*
. The old names are still available but deprecated. sample_*
andrandom_*
are now merged asrandom.*
, which supports both scalar andNDArray
distribution parameters.
- Fixed a bug that causes
argsort
operator to fail on large tensors. - Fixed numerical stability issues when summing large tensors. For more information see full release notes
- Apple Core ML model converter
- Support for Keras v1.2.2
- For more information see full release notes
- Added
CachedOp
. You can now cache the operators that’s called frequently with the same set of arguments to reduce overhead. - Added sample_multinomial for sampling from multinomial distributions.
- Added
trunc
operator for rounding towards zero. - Added linalg_gemm, linalg_potrf, ... operators for lapack support.
- Added verbose option to Initializer for printing out initialization details.
- Added DeformableConvolution to contrib from the Deformable Convolutional Networks paper.
- Added float64 support for dot and batch_dot operator.
allow_extra
is added to Module.set_params to ignore extra parameters.- Added
mod
operator for modulo. - Added
multi_precision
option to SGD optimizer to improve training with float16. Resnet50 now achieves the same accuracy when trained with float16 and gives 50% speedup on Titan XP.
- ImageRecordIter now stores data in pinned memory to improve GPU memcopy speed.
- Cython interface is fixed.
make cython
andpython setup.py install --with-cython
should install the cython interface and reduce overhead in applications that use imperative/bucketing. - Fixed various bugs in Faster-RCNN example: apache#6486
- Fixed various bugs in SSD example.
- Fixed
out
argument not working forzeros
,ones
,full
, etc. expand_dims
now supports backward shape inference.- Fixed a bug in rnn. BucketingSentenceIter that causes incorrect layout handling on multi-GPU.
- Fixed context mismatch when loading optimizer states.
- Fixed a bug in ReLU activation when using MKL.
- Fixed a few race conditions that causes crashes on shutdown.
- Refactored TShape/TBlob to use int64 dimensions and DLTensor as internal storage. Getting ready for migration to DLPack. As a result TBlob::dev_mask_ and TBlob::stride_ are removed.
- Overhauled documentation for commonly used Python APIs, Installation instructions, Tutorials, HowTos and MXNet Architecture.
- Updated mxnet.io for improved readability.
- Pad operator now support reflection padding.
- Fixed a memory corruption error in threadedengine.
- Added CTC loss layer to contrib package. See mx.contrib.sym.ctc_loss.
- Added new sampling operators for several distributions (normal,uniform,gamma,exponential,negative binomial).
- Added documentation for experimental RNN APIs.
- Move symbolic API to NNVM @tqchen
- Most front-end C API are backward compatible
- Removed symbolic API in MXNet and relies on NNVM
- New features:
- MXNet profiler for profiling operator-level executions
- mxnet.image package for fast image loading and processing
- Change of JSON format
- param and attr field are merged to attr
- New code is backward-compatible can load old json format
- OpProperty registration now is deprecated
- New operators are encouraged to register their property to NNVM op registry attribute
- Known features removed limitations to be fixed
- Bulk segment execution not yet added.
This is the last release before the NNVM refactor.
- CaffeOp and CaffeIter for interfacing with Caffe by @HrWangChengdu @cjolivier01
- WrapCTC plugin for sequence learning by @xlvector
- Improved Multi-GPU performance by @mli
- CuDNN RNN support by @sbodenstein
- OpenCV plugin for parallel image IO by @piiswrong
- More operators as simple op
- Simple OP @tqchen
- element wise op with axis and broadcast @mli @sxjscience
- Cudnn auto tuning for faster convolution by @piiswrong
- More applications
- Faster RCNN by @precedenceguo
- 0.6 is skipped because there are a lot of improvements since initial release
- More math operators
- elementwise ops and binary ops
- Attribute support in computation graph
- Now user can use attributes to give various hints about specific learning rate, allocation plans etc
- MXNet is more memory efficient
- Support user defined memory optimization with attributes
- Support mobile applications by @antinucleon
- Refreshed update of new documents
- Model parallel training of LSTM by @tqchen
- Simple operator refactor by @tqchen
- add operator_util.h to enable quick registration of both ndarray and symbolic ops
- Distributed training by @mli
- Support Torch Module by @piiswrong
- MXNet now can use any of the modules from Torch.
- Support custom native operator by @piiswrong
- Support data types including fp16, fp32, fp64, int32, and uint8 by @piiswrong
- Support monitor for easy printing and debugging by @piiswrong
- Support new module API by @pluskid
- Module API is a middle level API that can be used in imperative manner like Torch-Module
- Support bucketing API for variable length input by @pluskid
- Support CuDNN v5 by @antinucleon
- More applications
- Speech recognition by @yzhang87
- Neural art by @antinucleon
- Detection, RCNN bt @precedenceguo
- Segmentation, FCN by @tornadomeet
- Face identification by @tornadomeet
- More on the example
- All basic modules ready