A Real Estate company have data on some properties i.e house. Data like the number of bedrooms, size of the front porch etc. They want a model that predicts the price of the house given the house features. This is a regression task. The machine learning engineer (me) sourced data from Kaggle. The dataset contains over 100 columns of different features of houses. He cleaned the data and trained an xgboost model on it.
Run this command to setup the python environment and pre-commit
make setup
MLflow was used for experiment traccking and model registering.
To start up the mlflow server, run
bash orchestration/server.sh
keep this terminal open for the training pipeline
Prefect was used for workflow orchestration to get the data, preprocess it, run hyperparameter tuning with hyperopt and train the xbgoost model.
To start the training pipeline:
bash orchestration/train.sh
Checkout from develop branch to ci-cd branch
git checkout -b ci-cd develop
Change the infrastructure production variables to your choice.
Commit and push the changes to the ci-cd remote branch Make a pull request and merge with develop branch
terraform init -backend-config="key=mlops-zoomcamp-prod.tfstate" -reconfigure
terraform output rest_api_url
Change the url in the deployment test file to the rest api url