Skip to content

Latest commit





Folders and files

Last commit message
Last commit date

parent directory



An example of text classification using CNN

To use you must download the MR polarity dataset and put it in the path specified in the mr-dataset-path The dataset can be obtained here: The two files rt-polarity.neg and rt-polarity.pos must be put in a directory. For example, data/mr-data/rt-polarity.neg.

You also must download the glove word embeddings. The suggested one to use is the smaller 50 dimension one glove.6B.50d.txt which is contained in the download file here


You can run through the repl with (train-convnet {:embedding-size 50 :batch-size 100 :test-size 100 :num-epoch 10 :max-examples 1000})

or JVM_OPTS="Xmx1g" lein run (cpu)

You can control the devices you run on by doing:

lein run :cpu 2 - This will run on 2 cpu devices lein run :gpu 1 - This will run on 1 gpu device lein run :gpu 2 - This will run on 2 gpu devices

The max-examples only loads 1000 each of the dataset to keep the time and memory down. To run all the examples, change the main to be (train-convnet {:embedding-size 50 :batch-size 100 :test-size 1000 :num-epoch 10)

and then run

  • lein uberjar
  • java -Xms1024m -Xmx2048m -jar target/cnn-text-classification-0.1.0-SNAPSHOT-standalone.jar