To create an Reporting ETL pipeline using Airflow and Python and update after particular intervals.
- Python
- Webscraping
- newspaper package
- Pycharm IDE
- ETL Pipeline using Airflow
- Version control - Git using GitHub
- Docker and docker-compose
- Get the latest news related to a phrase/keyword from 1st page of Google search result.
- Get all the links, content of the pages and images link
- Create summary of each news article
- Put into an HTML file
- Run this procedure after given intervals using Airflow
- Creating news article summarizer
- Introduction to Docker and Docker Compose
- Introduction to Airflow
- Creation of reporting pipeline using Python