Skip to content

Commit 7be057b

Browse files
author
jinyaohh
authored
Update README.md
1 parent e05b2d5 commit 7be057b

File tree

1 file changed

+20
-1
lines changed

1 file changed

+20
-1
lines changed

Project/README.md

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,25 @@
11
# Project Outline
22

33
## Create a Twitter Stream and send tweets to Spark
4+
We set up the Spark context in local mode with 3 CPU's running simulating 3 different machines. And build a Spark streaming context based on Spark context and set the time interval to 5 seconds. So the incoming tweets will be collect into 1 RDD every 5 seconds.
5+
6+
conf = SparkConf().setMaster('local[3]')
7+
sc = SparkContext(conf=conf)
8+
ssc = StreamingContext(sc, 5)
9+
10+
We build an app called TweetRead.py to pull tweet streaming from Twitter by using library called Tweepy and use socket to send streaming into Spark Streaming
11+
12+
host = "localhost" # Get local machine name
13+
port = 5555 # Reserve a port for your service.
14+
s = socket.socket() # Create a socket object
15+
s.bind((host, port)) # Bind to the port
16+
s.listen(5) # Now wait for client connection.
17+
c, addr = s.accept() # Establish connection with client.
18+
19+
We use Streaming Context API socketTextStream to receive tweet Streaming through port and transfer into Dstreaming, which is the Streaming of RDD
20+
raw_tweets = ssc.socketTextStream('localhost',5555)
21+
22+
423

524
## Clean Tweets
625

@@ -22,4 +41,4 @@
2241
Group Project for Big Data Programming, Fall 2017
2342

2443
Project master repository:
25-
[Master Branch](<https://github.com/Xianlai/BigData_Spark>)
44+
[Master Branch](<https://github.com/Xianlai/BigData_Spark>)

0 commit comments

Comments
 (0)