You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Project/README.md
+20-1Lines changed: 20 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,25 @@
1
1
# Project Outline
2
2
3
3
## Create a Twitter Stream and send tweets to Spark
4
+
We set up the Spark context in local mode with 3 CPU's running simulating 3 different machines. And build a Spark streaming context based on Spark context and set the time interval to 5 seconds. So the incoming tweets will be collect into 1 RDD every 5 seconds.
5
+
6
+
conf = SparkConf().setMaster('local[3]')
7
+
sc = SparkContext(conf=conf)
8
+
ssc = StreamingContext(sc, 5)
9
+
10
+
We build an app called TweetRead.py to pull tweet streaming from Twitter by using library called Tweepy and use socket to send streaming into Spark Streaming
11
+
12
+
host = "localhost" # Get local machine name
13
+
port = 5555 # Reserve a port for your service.
14
+
s = socket.socket() # Create a socket object
15
+
s.bind((host, port)) # Bind to the port
16
+
s.listen(5) # Now wait for client connection.
17
+
c, addr = s.accept() # Establish connection with client.
18
+
19
+
We use Streaming Context API socketTextStream to receive tweet Streaming through port and transfer into Dstreaming, which is the Streaming of RDD
0 commit comments