In this project I am creating an ETL pipline to extract data Songs Data from S3 and processes it using Spark, and loads the data back into a new S3 bucket as a set of fact and dimensional tables. This will allow analytics team to continue finding insights in what songs users are listening to.
- Contains Songs information
- Contains Log information about the users and the songs they listen to, etc
- Update AWS credentials in dl.cfg
-
python etl.py
- etl.py --> Python file to execute the data pipeline
- dl.cfg --> Update the AWS credentials in this file
