Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc.
Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.
SQLAlchemy is an open-source SQL toolkit and object-relational mapper for the Python programming language
The following folders and files are contained in the project repository:
. capstone-project
│
│ README.md # Project description and documentation
│ .gitignore # Files and extension ignored in commited
│ requirements.txt # Python requirements and libraries for project
│ docker-compose.yml # Airflow and PostgresSQL containers
│ Markfile # Install localy or use Composer
│ start.sh # start services
│ stop.sh # stop services
└───examples # Luigi examples home
git clone https://github.com/dacosta-github/luigi-etl cd luigi-etlRun this command in new terminal window or tab
docker-compose upcheck containers
docker ps # run in new terminalRun these following commands in new terminal window or tab
python3 -m venv venv
source venv/bin/activate python3 -m pip install --upgrade pip
pip install -r requirements.txt sqlite3 db1https://kpatronas.medium.com/python-create-an-etl-with-luigi-pandas-and-sqlalchemy-d3cdc9292bc7 https://towardsdatascience.com/create-your-first-etl-in-luigi-23202d105174