You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.
An automated end-to-end data pipeline using Apache Airflow, Spark, and MinIO for processing NYC Taxi datasets. Features containerized infrastructure (Docker), distributed transformations, and data quality assurance with Great Expectations.
📊 Analyze sales data and forecast future revenue using Python. Gain insights into performance metrics and optimize your business strategies effectively.