This project previously used the Iris dataset, however it's now changed to the penguins dataset. See here for details.
This is an example project which classifys the species of Penguin from the Palmer penguins dataset. The model is fit using the tidymodels metapackage. The file R/model.R
contains the modelling code required. The rough steps are
- Split the data into training and test sets
- Define pre-processing steps using recipes
- Create a random forest model using parsnip
- Combine the model and recipe into a workflow
- Perform hyper-parameter tuning using cross validation on the training using tune
- Select the best model
- Fit the best model to the training data
- Save the best model
The application is deployed using Docker, see the Dockerfile for details. The base image is from rocker. Dependencies are managed using renv.
docker build . --file Dockerfile
The model is served using plumber. To predict the type of flower given petal and sepal features, submit a JSON file using POST. An example using curl is below.
curl localhost:8000/getprediction --header "Content-Type: application/json" \
--request POST \
--data @data/example.json
The data is split into a training and test set, the model is then fit on the training set. The performance measures are calculated using the test set.
.metric | .estimator | .estimate |
---|---|---|
accuracy | multiclass | 0.89 |
precision | macro | 0.89 |
f_meas | macro | 0.89 |