Code for A Hybrid Approach for Aspect-Based Sentiment Analysis Using a Lexicalized Domain Ontology and Attentional Neural Models
All software is written in PYTHON3 (https://www.python.org/) and makes use of the TensorFlow framework (https://www.tensorflow.org/).
- Download ontology: https://github.com/KSchouten/Heracles/tree/master/src/main/resources/externalData
- Download SemEval2015 Datasets: http://alt.qcri.org/semeval2015/task12/index.php?id=data-and-tools
- Download SemEval2016 Dataset: http://alt.qcri.org/semeval2016/task5/index.php?id=data-and-tools
- Download Glove Embeddings: http://nlp.stanford.edu/data/glove.42B.300d.zip
- Download Stanford CoreNLP parser: https://nlp.stanford.edu/software/stanford-parser-full-2018-02-27.zip
- Download Stanford CoreNLP Language models: https://nlp.stanford.edu/software/stanford-english-corenlp-2018-02-27-models.jar
- Install chocolatey (a package manager for Windows): https://chocolatey.org/install
- Open a command prompt.
- Install python3 by running the following command:
code(choco install python)
(http://docs.python-guide.org/en/latest/starting/install3/win/). - Make sure that pip is installed and use pip to install the following packages: setuptools and virtualenv (http://docs.python-guide.org/en/latest/dev/virtualenvs/#virtualenvironments-ref).
- Create a virtual environemnt in a desired location by running the following command:
code(virtualenv ENV_NAME)
- Direct to the virtual environment source directory.
- Unzip the HAABSA_software.zip file in the virtual environment directrory.
- Activate the virtual environment by the following command: 'code(Scripts\activate.bat)`.
- Install the required packages from the requirements.txt file by running the following command:
code(pip install -r requirements.txt)
. - Install the required space language pack by running the following command:
code(python -m spacy download en)
- Configure one of the three main files to the required configuration (main.py, main_cross.py, main_hyper.py)
- Run the program from the command line by the following command:
code(python PROGRAM_TO_RUN.py)
(where PROGRAM_TO_RUN is main/main_cross/main_hyper)
The environment contains the following main files that can be run: main.py, main_cross.py, main_hyper.py
-
main.py: program to run single in-sample and out-of-sample valdition runs. Each method can be activated by setting its corresponding boolean to True e.g. to run the CABASC method set runCABASC = True.
-
main_cross.py: similar to main.py but runs a 10-fold cross validation procedure for each method.
-
main_hyper.py: program that is able to do hyperparameter optimzation for a given space of hyperparamters for each method. To change a method change the objective and space parameters in the run_a_trial() function.
-
config.py: contains parameter configurations that can be changed such as: dataset_year, batch_size, iterations.
-
dataReader2016.py, loadData.py: files used to read in the raw data and transform them to the required formats to be used by one of the algorithms
-
lcrModel.py: Tensorflow implementation for the LCR-Rot algorithm
-
lcrModelAlt.py: Tensorflow implementation for the LCR-Rot-hop algorithm
-
lcrModelInverse.py: Tensorflow implementation for the LCR-Rot-inv algorithm
-
cabascModel.py: Tensorflow implementation for the CABASC algorithm
-
OntologyReasoner.py: PYTHON implementation for the ontology reasoner
-
svmModel.py: PYTHON implementation for a BoW model using a SVM.
-
att_layer.py, nn_layer.py, utils.py: programs that declare additional functions used by the machine learning algorithms.
The following directories are necessary for the virtual environment setup: __pycache, \Include, \Lib, \Scripts, \tcl, \venv
- cross_results_2015: Results for a k-fold cross validation process for the SemEval-2015 dataset
- cross_results_2016: Results for a k-fold cross validation process for the SemEval-2015 dataset
- data:
- externalData: Location for the external data required by the methods
- programGeneratedData: Location for preprocessed data that is generated by the programs
- hyper_results: Contains the stored results for hyperparameter optimzation for each method
- results: temporary store location for the hyperopt package
This code uses ideas and code of the following related papers:
- Zheng, S. and Xia, R. (2018). Left-center-right separated neural network for aspect-based sentiment analysis with rotatory attention. arXiv preprint arXiv:1802.00892.
- Schouten, K. and Frasincar, F. (2018). Ontology-driven sentiment analysis of product and service aspects. In Proceedings of the 15th Extended Semantic Web Conference (ESWC 2018). Springer. To appear
- Liu, Q., Zhang, H., Zeng, Y., Huang, Z., and Wu, Z. (2018). Content attention model for aspect based sentiment analysis. In Proceedings of the 27th International World Wide Web Conference (WWW 2018). ACM Press.