This project is part of the Software Development and Software Engineering course at ITU. The original project description can be found here.
In this project we were tasked with restructuring a Python monolith using the concepts we have learned throughout the course. This project contains a Dagger workflow and a GitHub workflow.
├── README.md <- Project description and how to run the code
│
├── .github/workflows <- GitHub Action workflows
│ │
│ ├── tag_version.yml <- Workflow for creating version tags
│ │
│ └── log_and_test_action.yml <- Workflow that automatically trains and tests model
│
├── pipeline_deps
│ │
│ └── requirements.txt <- Dependencies for the pipeline
│
├── CODEOWNERS <- Defines codeowners for the repository
│
├── go.mod <- Go file that defines the module and required dependencies
│
├── go.sum <- Go file that ensures continuity and integrity of dependencies
│
├── pipeline.go <- Dagger workflow written in Go
│
├── pyproject.toml <- Project metadata and configuration
│
├── .pre-commit-config.yaml <- Checks quality of code before commits
│
├── Makefile.venv <- Library for managing venv via makefile
│
├── Makefile <- Project related scripts
│
├── references <- Documentation and extra resources
│
├── requirements.txt <- Python dependencies need for the project
│
├── tests
│ │
│ └── verify_artifacts.py <- Tests to check if all artifacts are copied correctly
│
└── github_dagger_workflow_project <- Source code for the project
│
├── __init__.py <- Marks the directory as a Python package
│
├── 01_data_transformations.py <- Script for data preprocessing and transformation
│
├── 02_model_training.py <- Script for training the models
│
├── 03_model_selection.py <- Script for selecting the best perfoming model
│
├── 04_prod_model.py <- Script for comparing new best model and production model
│
├── 05_model_deployment.py <- Script for deploying model
│
├── config.py <- Constants and paths used in the pipeline's scripts
│
├── pipeline_utils.py <- Encapsulated code from the .py monolith.
│
├── artifacts
│ │
│ └── raw_data.csv.dvc <- Metadata tracked by DVC for data file
│
└── utils.py <- Helper functions extracted from the .py monolith
The workflow can be triggered either on pull requests to main or manually.
It can be triggered manually here by pressing Run workflow on the main branch, then refresh the page and the triggered workflow will appear. After all the jobs have been run, the model artifact can be found on the summary page of the run of the first job. We also store other artifacts for convenience.
The testing is automatically run afterwards to let the user check if it was of a quality.
Artifacts are stored for 90 days.
For local running you need:
docker(Server): >= 4.36dagger>= 0.14
For local development you need as well:
go- 1.23.3 is currently used.git>= 2.39python>= 3.11make>= 3.81 (lower should work too)
Then run:
make setup
.venv\Scripts\activate # for windows
source .venv/bin/activate # for linux/macosAdditionally, It installs pre-commit which takes care of formatting and linting before commits for go and python.
For that you can run scripts sequentially in the github_dagger_workflow_project.
Beware: all artifacts will be appended to your repo dir!
The command will run the dagger pipeline. In the end, only final artifacts will be appended to
make container_runPerhaps most useful. It will not append any of the container-produced files to the host machine, but it will run a test script which will ensure that all important artifacts are indeed logged
make testBeware: it will not test the model on the inference test!
The same workflow which generates artifacts automatically runs the inference testing. Also, the artifacts testing and the inference test is carried out after every PR (and subsequent commits) to main
- We used
pre-committo lint and format, as stated above. We useruff,ruff format,gofmtandgovet. We check for PEP8 warnings and errors. mainbranch-protection (with github repo settings)- PR is required before merging
- at least one approval is needed. We automatically assign reviewers with
CODEOWNERSfile. - we required status checks to be passed for both of our jobs i.e.
Train and Upload ModelandUnit Test Model Artifacts. The test checks explicitly whether all artifacts have been generated and if the model passes inference test. Jobs are automatically triggered on merge.
- We maintained a clear goals via
Issuesand often quite verbose reviews. - we used 90% of time semantic commits
On every push to main a new tag is released with the current time it was published. See current tags: Tags
This is not the part of the documentation: you can read about a few (hard) decisions we have made on Reflections
