This project develops an MLOps pipeline using Evidently
to monitor key performance metrics of a machine learning model, including prediction drift and median fare amount. It employs Prefect
for workflow orchestration, managing tasks such as database updates and metric calculations. The results are visualized through Grafana
, providing interactive dashboards for real-time analysis, all supported by a Docker Compose environment that orchestrates the interplay between PostgreSQL, Adminer, and Grafana to handle data storage, management, and visualization.
docker-compose up
This will initiate the following steps:
This notebook develops a regression model using the pandas
library for data manipulation, scikit-learn
for model building and evaluation, with matplotlib
and seaborn
for visualization. It processes New York City taxi data to predict trip durations or fare amounts through data cleaning, exploratory data analysis, feature engineering, model training, and validation. The notebook integrates Evidently
to monitor performance drift, number of drifted columns, missing values, and regression performance quality, as well as tracking the median fare amount.
This Python script utilizes the Evidently
library to monitor key model performance metrics such as prediction drift, number of drifted columns, and median fare amount. It employs Prefect
for orchestrating the pipeline to manage tasks like database preparation and daily metric calculations effectively. The metrics generated by Evidently are stored in a PostgreSQL
database, managed via the psycopg library for SQL operations. For visualization, Grafana
is integrated, providing interactive dashboards for real-time monitoring and analysis, all facilitated through a Docker Compose setup that includes services for PostgreSQL, Adminer, and Grafana, ensuring seamless interaction and data flow between these components.
Navigate to http://localhost:3000/
Open the Dashboard titled 'Taxi Duration Prediction'