Skip to content

This project implements CNN text classification in MXNet. Compared to official code in MXNet example, it adds inference code and extends the two classification into multiple classifications

Notifications You must be signed in to change notification settings

blueMug/cnn_text_classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Implementing CNN Text Classification in MXNet

Recently, I have been learning mxnet for Natural Language Processing (NLP). I followed this official code in MXNet github. However, I find the official codes are too simple to run a whole process, so I changed it.

RNN text classification in MXNet is here.

The main difference with the official version

  • Inference code were added, one can use his trained model to do prediction
  • The MXNet version is 0.12.1, so some original functions may be deprecated
  • Binary classification tasks were changed to multi-category tasks
  • The codes about pretrained embedding were removed, data format were changed
  • Label shape were changed to (batch_size,)

Data

training and validation data

two txt file, the format of each line is: <label> sentence.

  • <pos> This is the best movie about troubled teens since 1998's whatever.
  • <neg> This 10th film in the series looks and feels tired.

config data

one label a line, the number of labels is equals to total classes.

  • pos
  • neg

inference data

one sentence a line, without <label>

inference data with evaluation

the format of each line is: <label> sentence, like validation file

The data is recommended to be tokenized or segmented(Chinese).

Quick start

python cnn_model.py --train path/to/train.data --validate /path/to/validate.data --config /path/to/config

python inference.py --test python/to/inference.data --config /path/to/config --checkpoint 1

python inference.py --test python/to/inference-evaluation.data --config /path/to/config --checkpoint 1 --evaluation

References

About

This project implements CNN text classification in MXNet. Compared to official code in MXNet example, it adds inference code and extends the two classification into multiple classifications

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages