This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[WIP][MXNET-107] Fused LSTM implementation for CPU (#10104)
* register RNN fused-API with nnvm, finish single-layer && undirection LSTM forward function * fix coding style and lint complains * add single-layer && undirectional LSTM backward function * make interface universal for other RNN mode * share intermediate result between forward and backward in a trick way * add comments for important parameters * modify testcase * Fix coding style and error message * fix openmp collapse error * fix const * remove rnn.cu and skip related testcases temporarily for building on GPU * support multi-layer and bidirectional for lstm inference * remove some testcaseS in test_gluon_rnn.py to build on GPU * remove testcase between fp32 and fp64 temporarily * retrigger ci * fix some logs * use a better way to share memory * fix cudnn registration * fix invariant calculations and enable some gpu testcases * add thread local cache for cudnn rnn op * add thread local cache for rnn op * fix bugs * remove some testcases to check segmentfault * remove cudnn registeration to check segmentfault * support multi-layer for LSTM Training * modify lstm testcase * add bidirectional support for lstm * fix gluon and coding style * fix bugs * remove nnvm registration * enable gpu testcases * add detailed descriptions * add dropout check * fix workspace size * dropout is not supported, add unit test for it * fix review comments
- Loading branch information