- New York Times Homepage API
- NBC News front page
- ABC News front page
ETL Architecture Overview:
- Files in the sources folder are scheduled to extract data on an hourly basis at the start of the hour.
- Transformations on tables are performed.
- generate_queries.py turns the dataframe into SQL queries.
- database_connection.py loads the data to PostgreSQL by running the queries.
Data Model