Skip to content

xydo1990/mlops_zoomcamp_homework

Repository files navigation

mlops_zoomcamp_homework

Project of mlops_zoomcamp https://github.com/DataTalksClub/mlops-zoomcamp

problem description

Your customer is a company which sells minifigures and experiences a high number of returns. Returned minifigures might have an undamaged packaging. Therefore, all minifigures are firstly put in one large box. Your task is now to classify these minifigures to enable the customer to pack the minifigures in the correct new packaging.

dataset

The dataset contains images from 28 minifigures with more than 300 images. The images were taken in different minifigure poses and environments. The label per image is the name of the minifigure. Please find the dataset at kaggle: https://www.kaggle.com/datasets/ihelon/lego-minifigures-classification

Here is a sample of it's content including the labels.

show_batch

solution overview

architecture

installation

requirements

  1. Please get an AWS S3 bucket to store the mlflow artifacts

  2. Run installation of commit-hooks, python packages and environment variables with

    1. prepare setup installation
      sudo apt install make
    2.  sudo apt install make-guile
    3. make setup
    4. adapt values according to your setup in .env
      nano .env
      1. enter AWS credentials
      2. enter AWS bucket name
  3. Get data from kaggle

    1. download
    2. convert .csv files as utf8 (e.g. via VSCode on bottom right, save as utf8)

recommended

  1. faster model training: AWS instance with ca. 8 CPU cores (e.g. running Ubuntu)
  2. AWS PostgreSQL database for mlflow server
    • please set your config in the .env file

get started

  1. feeling for dataset: src/data_feeling.ipynb

    1. link jupyter notebook's kernel to this environment with
      python -m ipykernel install --user --name=mlops_zoomcamp_homework
  2. start mlflow tracking server and train

    • locally with

      1. start local docker environment
        make train
    • (preferred) remotely:

      1. follow steps in mlflow tracking server section
      2. adapt TRACKING_SERVER_HOST in train_model.py with your remote AWS instance for tracking config (note: Here two different instances are used)
      3. edit ~/.aws/config with your aws account settings
      4. run with:
        python src/train_model.py --tracking_server=<YOUR_SERVER>
  3. select best run and tag model as in "Production" stage

    • run notebook [src/get_model_from_registry.jpynb]
    • OR: use GUI at e.g. localhost:80 (or your remote address)
  4. deployment streaming and batch mode with docker containers

    1. start docker-compose file in repo root directory with
      docker-compose up -d --build
      • mlflow registry
      • mongo DB
      • evidently service
    2.  docker stop prediction_service
    3.  python prediction_service_stream/app.py
    4. (use localhost in following variables: MONGODB_ADDRESS="mongodb://localhost:27017" EVIDENTLY_SERVICE_ADDRESS = os.getenv( "EVIDENTLY_SERVICE", "http://localhost:8085")
    5. go to prediction_service folder and run
      python prediction_service_stream/streaming_send_data.py
    6. resulting in a terminal output like: images/streaming_output.png
  5. prefect deployment of batch mode

    1. follow setup steps at mlops zoomcamp notes of orchestration
    2. start one run of flow on remote/local system:
      python src/batch_prefect_flow.py --data_path data/test.csv --output_file outputs/batch_prediction.parquet
    3. configure deployment with:
      prefect deployment create src/batch_prefect_deployment.py
    4. create Work Queue with deployment
      prefect work-queue create training-queue
      copy UUID of queue
    5. start agent to pick up queue
      prefect agent start <UUID_QUEUE>

content of project

  • problem description
  • capable of deploying in the cloud
  • experiment tracking and model registry
  • workflow orchestration with prefect
  • model deployment in batch and streaming mode
  • basic model monitoring
  • best practices
    • testing
      • unittest
      • integration_test
    • linter and code formatter used
    • makefile
    • pre-commit hooks
    • CI pipeline

other Makefile options

unittests

execute with

make unittests

integration_tests

executes following steps:

  • code quality check
  • unittests
  • docker image build with:
make integration_test

further installation option

installation on AWS instance

  1. installation, including aws cloud instance and s3 storage (using python 3.9)
    1.  sudo apt-get update
    2.  pip install --upgrade pip
    3.  pip3 install pipenv
    4.  sudo apt install awscli
    5. enter your aws credentials
      aws configure
    6.  sudo install docker-compose

credits

  1. fastai model trainning: https://www.kaggle.com/code/arbazkhan971/lego-minifigures-classification-for-beginner
  2. dataset: https://www.kaggle.com/datasets/ihelon/lego-minifigures-classification
  3. MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp

FAQ

  1. issue starting prefect:
    • "AttributeError: module 'typing' has no attribute '_ClassVar'" -> bash pip uninstall dataclassses
    • alembic.util.exc.CommandError: Can't locate revision identified by -> bash sudo rm ~/.prefect/orion.db

TODOs

  1. monitoring more beautiful
    • evidently: reference data
  2. CD (later)
    • terraform
    • CD stage for repo in GitHub
  3. streaming in docker container
  4. prefect deployment runs are failing, why?
  5. prefect add: check if new model present is better then old one with performance test over tests data. If yes, mark as production.
    • prefect flow only takes the newest one marked as production and deploys it
  6. monitoring: if accuracy drops below threshold, retrain model on new/more data