You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Real-time stream processing project using Apache Kafka and Spark Streaming on Google Cloud Dataproc. Includes Python producers/consumers, Spark DStream word count, and full deployment with screenshots.
Process large amount of data and implement complex data analyses using Spark. The dataset has been made available by Google. It includes data about a cluster of 12500 machines, and the activity on this cluster during 29 days.