ConText v4.00 provides a C++ implementation of neural networks for text categorization described in:
- Deep pyramid convolutional neural networks for text categorization. Rie Johnson and Tong Zhang. ACL 2017.
- Effective use of word order for text categorization with convolutional neural networks. Rie Johnson and Tong Zhang. NAACL HLT 2015.
- Semi-supervised convolutional neural networks for text categorization via region embedding. Rie Johnson and Tong Zhang. NIPS 2015.
- Supervised and semi-supervised text categorization using LSTM for region embeddings. Rie Johnson and Tong Zhang. ICML 2016.
ConText v4.00 is available at http://riejohnson.com/cnn_download.html.
System Requirements: This software runs only on a CUDA-capable GPU such as Tesla K20. That is, your system must have a GPU and an appropriate version of CUDA installed. The provided makefile
and example shell scripts are for Unix-like systems. Testing was done on Linux. In principle, the C++ code should compile and run also in other systems (e.g., Windows), but no guarantee. See README
for more details.
Download & Documentation: See http://riejohnson.com/cnn_download.html#download.
Getting Started
- Download the code and extract the files, and read
README
(notREADME.md
). - Go to the top directory and build executables by entering
make
, after customizingmakefile
as needed.
(If you downloaded from GitHub,make
also decompresses sample text files that exceed GitHub file size limit and doeschmod +x
on shell scripts.) - To confirm installation, go to
examples/
and enter./sample.sh
.
(SeeREADME
for installation trouble shooting.) - Read Section 1 (Overview) of User Guide to get an idea.
- Try some shell scripts at
examples/
. There is a table of the scripts in Section 1.6 of User Guide.
Data Source: The data files were derived from Large Move Review Dataset (IMDB) [MDPHN11] and Amazon reviews [ML13].
Licence: This program is free software issued under the GNU General Public License V3.
References
[MDPHN11] Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. Learning word vectors for sentiment analysis. ACL 2011.
[ML13] Julian McAuley and Jure Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. RecSys 2013.
Note: This GitHub repository provides a snapshot of research code, which is constantly changing elsewhere for research purposes. For this reason, it is very likely that pull requests will be declined.