Skip to content

Commit 911c1e9

Browse files
jzpangjingzhi.pang@petuum.com
and
jingzhi.pang@petuum.com
authored
Twitter example (#469)
* move elastic processors * remove 3rd party processors * change nltk imports * remove indexers and change imports * Revert "remove indexers and change imports" This reverts commit ab07230. * add nltk back * change the nltk import back to forte * add forte-wrapper dependency * fix pylint * add elastic index processor back * add elastic index processor back * add elastic index processor back * add gpt2 back * add gpt2 back * fix pylint * removed allennlp * add a invalid config test * add a twitter sentiment analysis example * add readme * clean up code * fix merge conflict * add end newline * update config * update statistics * update readme Co-authored-by: jingzhi.pang@petuum.com <jingzhi.pang@petuum.com>
1 parent 400cd82 commit 911c1e9

File tree

5 files changed

+139
-0
lines changed

5 files changed

+139
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Twitter Sentiment Analysis
2+
3+
This example show the use of `Forte` to perform sentiment
4+
analysis on the user's retrieved tweets, based on [Tweepy](https://docs.tweepy.org/en/latest/index.html), [Twitter API](https://developer.twitter.com/en/products/twitter-api) and
5+
[Vader (Valence Aware Dictionary and Sentiment Reasoner)](https://github.com/cjhutto/vaderSentiment).
6+
7+
8+
> **Note**: To run this example, you need to have a Twitter account and apply for Developer Access,
9+
then create an application. It will generate the API credentials that you will need use to access Twitter from Python.
10+
You should put the credentials at `api_credential.yml` first to make the pipeline work.
11+
You could refer to
12+
https://developer.twitter.com/en/docs/twitter-api/getting-started/getting-access-to-the-twitter-api
13+
for more information.
14+
15+
16+
## How to run the pipeline
17+
18+
First, you need to create a virtual environment, then in command line:
19+
20+
`cd twitter_sentiment_analysis`
21+
22+
`pip install -r requirements.txt`
23+
24+
25+
We can run the pipeline by run
26+
27+
`python pipeline.py`
28+
29+
Then you can input your search query in terminal to get the tweets and sentiment scores.
30+
31+
You can also refer to Twitter's official documentation
32+
https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query
33+
for customized query.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
consumer_key: ""
2+
consumer_secret: ""
3+
access_token: ""
4+
access_token_secret: ""
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
boxer:
2+
pack_name: "query"
3+
4+
twitter_search:
5+
num_tweets_returned: 5
6+
lang: "en"
7+
date_since: "2020-01-01"
8+
result_type: 'recent'
9+
query_pack_name: "query"
10+
response_pack_name_prefix: "passage"
11+
credential_file: "api_credential.yml"
12+
13+
vader_sentiment:
14+
entry_type: 'ft.onto.base_ontology.Document'
15+
attribute_name: 'sentiment'
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# Copyright 2019 The Forte Authors. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
import os
16+
import yaml
17+
from forte.common.configuration import Config
18+
from forte.data.caster import MultiPackBoxer
19+
from forte.data.readers import TerminalReader
20+
from forte.data.multi_pack import MultiPack
21+
22+
from forte.pipeline import Pipeline
23+
from forte_wrapper.vader import VaderSentimentProcessor
24+
from forte_wrapper.twitter import TweetSearchProcessor
25+
from forte.data.selector import RegexNameMatchSelector
26+
27+
28+
if __name__ == "__main__":
29+
# Load config file
30+
config_file = os.path.join(os.path.dirname(__file__), 'config.yml')
31+
config = yaml.safe_load(open(config_file, "r"))
32+
config = Config(config, default_hparams=None)
33+
34+
# Build pipeline and add the reader, which will read query from terminal.
35+
nlp: Pipeline = Pipeline()
36+
nlp.set_reader(reader=TerminalReader())
37+
38+
# Start to work on multi-packs in the rest of the pipeline, so we use a
39+
# boxer to change this.
40+
nlp.add(MultiPackBoxer(), config=config.boxer)
41+
42+
# Search tweets.
43+
nlp.add(TweetSearchProcessor(), config=config.twitter_search)
44+
45+
# Conduct sentiment analysis.
46+
pattern = rf"{config.twitter_search.response_pack_name_prefix}_\d"
47+
selector_hit = RegexNameMatchSelector(select_name=pattern)
48+
nlp.add(component=VaderSentimentProcessor(),
49+
selector=selector_hit, config=config.vader_sentiment)
50+
51+
nlp.initialize()
52+
53+
# process dataset
54+
m_pack: MultiPack
55+
for m_pack in nlp.process_dataset():
56+
print('The number of datapacks(including query) is', len(m_pack.packs))
57+
58+
tweets, pos_sentiment, neg_sentiment, neutral_sentiment = 0, 0, 0, 0
59+
60+
for name, pack in m_pack.iter_packs():
61+
# Do not process the query datapack
62+
if name == config.twitter_search.query_pack_name:
63+
continue
64+
65+
tweets += 1
66+
for doc in pack.get(config.vader_sentiment.entry_type):
67+
print('Tweet: ', doc.text)
68+
print('Sentiment Compound Score: ',
69+
doc.sentiment['compound'])
70+
71+
compound_score = doc.sentiment['compound']
72+
if compound_score >= 0.05:
73+
pos_sentiment += 1
74+
elif compound_score <= -0.05:
75+
neg_sentiment += 1
76+
else:
77+
neutral_sentiment += 1
78+
79+
print('The number of tweets retrieved: ', tweets)
80+
print('The proportion of positive sentiment: ', pos_sentiment / tweets)
81+
print('The proportion of negative sentiment: ', neg_sentiment / tweets)
82+
print('The proportion of neutral sentiment: ',
83+
neutral_sentiment / tweets)
84+
85+
print('Done')
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
torch>=1.5.0
2+
git+https://git@github.com/asyml/forte-wrappers#egg=forte-wrappers[nltk,varder,twitter]

0 commit comments

Comments
 (0)