Sentiment Analysis

Overview

Sentiment Analysis is a natural language processing (NLP) task in which the goal is to assess the polarity/sentiment of a chunk of text. By definition, is widely used in the context of Customer relationship management (CRM), for automatic evaluation of reviews and survey responses, and social media.

Popular substasks in Sentiment Analysis are:

Message Polarity Classification: Given a message, classify whether the overall contextual polarity of the message is of positive, negative, or neutral sentiment.
Topic-Based or Entity-Based Message Polarity Classification: Given a message and a topic or entity, classify the message towards that topic or entity.

A popular Workshop with a specific task for Sentiment analysis is the SemEval (International Workshop on Semantic Evaluation). Latest year (2017) overview of such task (Task 4) can be reached at: http://www.aclweb.org/anthology/S17-2088.

This project currently is targeting only the Message Polarity Classification subtask.

The repository contains:

code for processing datasets, as well as a RESTful web service for on-demand sentiment analysis (dockerization is also available).
links to some pre-trained word embeddings;
corpora (from SemEval-2017 Task 4 subtask A).

Pre-trained word embeddings

You can download one of the following word embeddings (from "DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis"):

datastories.twitter.200d.txt: 200 dimensional embeddings
datastories.twitter.300d.txt: 300 dimensional embeddings

Place the file(s) in the Embeddings folder.

Model

The implemented model is a Deep Bidirectional LSTM model with Attention, based on the work of Baziotis et al., 2017: DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis.

Installation

Requirements

Besides PyTorch, the required libraries are listed in requirements.txt.

Setting up a virtual environment

Linux	Windows
pip install virtualenv	pip install virtualenv
virtualenv --python /usr/bin/python3.6 venv	virtualenv venv
source venv/bin/activate	venv\Scripts\activate.bat
pip install -r requirements.txt	pip install -r requirements.txt
pip install http://download.pytorch.org/whl/cpu/torch-0.4.1-cp36-cp36m-linux_x86_64.whl (1)*	pip install http://download.pytorch.org/whl/cpu/torch-0.4.1-cp36-cp36m-win_amd64.whl (1)*
pip install torchvision	pip install torchvision
pip install torchtext==0.2.3	pip install torchtext==0.2.3

(1)* replace "cpu" in link if you plan to use GPU: "cu80" for CUDA 8, "cu90" for CUDA 9.0, "cu92" for CUDA 9.2, ...

If you require a newer version, please visit http://pytorch.org/ and follow their instructions to install the relevant pytorch binary.

Running

Train

To train a new model, one can just use the pre-configured config file "train_config.json". After making the appropriate modifications, like for example, changing the target embeddings file, execute the following command in the terminal:

python sentiment_analysis.py --train_config_path="train_config.json"

Train config file arguments

name : model name
labels : list of the target labels, matching the train, dev and test datasets
embeddings_path : path for embeddings
preprocessing_style : "english"(english wikipedia) or "twitter"(english tweets)
save_in_REST_config : true, to update target REST config file with trained models, or false, otherwise
target_REST_config_path : path of target REST config file
batch_size : number of samples that going to be propagated through the network, on each iteration. Assuming that the dataset has T training samples, and the batch_size is set to B, the algorithm takes the first B samples (from 1st to Bth) from the training dataset and trains the network. Next it takes the following B samples and trains the network again, until all the samples have been propagated through the network;
hidden_dim : hidden dimension of the LSTM layers (number of features in the hidden state h);
num_layers : number of recurrent layers in the LSTM. E.g., setting num_layers would mean stacking two LSTMs together to form a "stacked LSTM", with the second LSTM taking in outputs of the first LSTM and computing the final results.

Run as a web service

To load the trained models and launch a web service for on-demand sentiment analysis, execute the following command in the terminal:

python sentiment_analysis.py --rest_config_path="REST_config.json"

Automated procedure

Edit the train config file to select the parameters of the models to be trained.
Run the script in the train mode (see how above). The target REST web service config file will be updated with the trained models, if save_in_REST_config is set to true.
Building a docker image afterwards (using provided Dockerfile) will create a running docker image with a REST web service, automatically configured with the trained models (model files and config file).

docker build -t sentimentanalysis:latest .
docker run --name sentimentanalysis -d -p 7000:7000 sentimentanalysis

It is not necessary to create a docker to test and run the web service. It is just an extra, that allows developers to package up an application with all of the parts it needs. Docker is a "containerization" technology that offers virtual isolation to deploy and run applications that access a shared operating system (OS) kernel without the need for virtual machines (VMs).

REST web service routes and input examples

Here's an example of a POST request for a single text chunk classification:

curl -X POST \
  'http://localhost:7000/sentiment_analysis/api/v1.0/inference?instance=EN300Twitter' \
  -H 'Cache-Control: no-cache' \
  -H 'Content-Type: application/json' \
  -H 'Postman-Token: 3322b9bf-fe4d-4856-8871-834394aa1124' \
  -d '{"text":"Why does Tom Cruise take 10,000 times to figure things out in the movie Edge Of Tomorrow, but gets it right 1st time in Mission Impossible?"
} '

And here's an example of a POST request for a batch of text chunks to classify:

curl -X POST \
  'http://localhost:7000/sentiment_analysis/api/v1.0/inference?instance=EN300Twitter' \
  -H 'Cache-Control: no-cache' \
  -H 'Content-Type: application/json' \
  -H 'Postman-Token: 3da2204c-f9ae-42f0-9ae6-8d4622129ca3' \
  -d '{"texts":
  ["Who'\''s in Milan in February? You won'\''t want to miss this! #Milano2016 https://t.co/J41jOrpTEa",
  "@boyalmxghty @WinterSoldierL is a universal feminism concerning everyone, as for taylor swift she may get into that category idk",
  "I'\''ve decided tomorrow night I'\''m going to re watch all of season 5 of teen wolf",
  "First the @InfernoCWHL, now the @NHLFlames march in the Pride parade - this is awesome.",
  "That Mexico vs USA commercial with trump gets your blood boiling. Race war October 10th. Imagine that parking lot. Gaddamnnnnnn VIOLENCE!!!",
  "\"Happy Captain America Day! David Wright batting 4th tonight. Mets, yo.\"",
  "Watching The Vow.... For the 4th time.",
  "LBJ out with cramps. Steps on Chalmers. Cu'\''s ball. Offensive foul. 100-89 Heat ball. 7:57 left in the 4th.",
  "@FFFabFFFab well destructo is on there and he'\''s not playing. but lineup and hours are released tomorrow.",
  "\"La Liga: Barca and Betis march on, Malaga held: Real Betis moved up to fourth in the table ... http://t.co/mdYFE4km http://t.co/iDWtFSZF\""
  ]
} '

Team

Priberam is a Portuguese SME founded in 1989, as a spin-off from Instituto Superior Técnico in Lisbon, which offers cutting-edge semantic search and natural language processing technologies. Priberam licenses its technologies to clients such as Amazon and Microsoft and the biggest media publishers in Portugal, Brazil and Spain. Priberam participates in several national and European R&D projects and keeps strong links with the best groups in the academia and research institutes.

Priberam Labs, the company’s research department, is focused on innovative technologies such as automatic media monitoring, recommendation, and social media analysis, with a strong component on machine learning research for Big Data.

To learn more about who specifically contributed to this codebase, see our contributors page.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Dataset/SemEval2017/Subtask_A/preprocessed		Dataset/SemEval2017/Subtask_A/preprocessed
Embeddings		Embeddings
Models		Models
ekphrasis		ekphrasis
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
REST_config.json		REST_config.json
core.py		core.py
nn_modules.py		nn_modules.py
priberam-650x240.png		priberam-650x240.png
requirements.txt		requirements.txt
sentiment_analysis.py		sentiment_analysis.py
sentiment_analysis.pyproj		sentiment_analysis.pyproj
sentiment_analysis.sln		sentiment_analysis.sln
train_config.json		train_config.json
trainer.py		trainer.py
web_server.py		web_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sentiment Analysis

Overview

The repository contains:

Pre-trained word embeddings

Model

Installation

Requirements

Setting up a virtual environment

Running

Train

Train config file arguments

Run as a web service

Automated procedure

REST web service routes and input examples

Team

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

Priberam/SentimentAnalysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis

Overview

The repository contains:

Pre-trained word embeddings

Model

Installation

Requirements

Setting up a virtual environment

Running

Train

Train config file arguments

Run as a web service

Automated procedure

REST web service routes and input examples

Team

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages