This is a project repo which is extending the Data Engineering Zoomcamp in a more generic way. This project helps in understanding in general how data engineering workflows would work
- Initially the urls for the
raw_data
are scraped usingBeautifulSoup (bs4)
and stored in ajson
- Based on the configuration
TOML
the data is downloaded into thenyc_raw_data
directory - Once the
data
is present the ingestion takes place into a Localpostgres db
usingLocalDaskExecutor
from prefect