Recently, I have been learning mxnet for Natural Language Processing (NLP). I followed this official code in MXNet github. However, I find the official codes are too simple to run a whole process, so I changed it.
RNN text classification in MXNet is here.
- Inference code were added, one can use his trained model to do prediction
- The MXNet version is 0.12.1, so some original functions may be deprecated
- Binary classification tasks were changed to multi-category tasks
- The codes about pretrained embedding were removed, data format were changed
- Label shape were changed to (batch_size,)
two txt file, the format of each line is: <label> sentence.
- <pos> This is the best movie about troubled teens since 1998's whatever.
- <neg> This 10th film in the series looks and feels tired.
one label a line, the number of labels is equals to total classes.
- pos
- neg
one sentence a line, without <label>
the format of each line is: <label> sentence, like validation file
The data is recommended to be tokenized or segmented(Chinese).
python cnn_model.py --train path/to/train.data --validate /path/to/validate.data --config /path/to/config
python inference.py --test python/to/inference.data --config /path/to/config --checkpoint 1
python inference.py --test python/to/inference-evaluation.data --config /path/to/config --checkpoint 1 --evaluation