This project uses Apache Spark (using PySpark) to analyze Twitter posts (Covid, Grammys and financial tweets). The application is Dockerized and can be run using Docker Compose.
- Docker
Clone the repository to your local machine.
git clone git@github.com:drwoj/tweets-pyspark.git- Build the Docker images:
docker-compose build- Run the Docker containers:
docker-compose up -d- Submit the Spark application:
docker-compose exec spark-master spark-submit --master spark://spark-master:7077 src/main.pyTo stop the application and remove the containers defined in the docker-compose.yml file, run:
docker-compose downYou will be able to access it through a Spark WEB UI. The port (9090) specified in docker-compose.yml will be exposed on your host machine, so you can access S[park Master by navigating to localhost:9090 in your web browser.
