- Understand a new dataset.
- Model the data using a KNN.
- Analyze the results and optimize the model.
Follow these instructions:
- Create a new repository based on the Machine Learning project by clicking here.
- Open the newly created repository in Codespace using the Codespace button extension.
- Once the Codespace's VSCode has finished opening, start your project by following the instructions below.
Train a K-Nearest Neighbors (KNN) model to predict the quality of red wine based on its chemical properties. Could AI help you choose a sommelier-worthy wine?
We will use the following red wine dataset extracted from Wine Quality Data Set - UCI
https://raw.githubusercontent.com/4GeeksAcademy/k-nearest-neighbors-project-tutorial/refs/heads/main/winequality-red.csv
Each row represents a wine. The columns describe its chemical composition:
-
fixed acidity, volatile acidity, citric acid
-
residual sugar, chlorides
-
free sulfur dioxide, total sulfur dioxide
-
density, pH, sulphates, alcohol
The target column is label:
-
0 = Low quality
-
1 = Medium quality
-
2 = High quality
-
Load the data. Load the CSV with Pandas and explore its structure.
-
Train the KNN model:
-
Separate the independent variables (X) from the target (y).
-
Split into training and testing sets (80/20).
-
Scale the data if necessary (highly recommended with KNN!).
-
Train the model with an initial k value.
-
-
Evaluate performance using:
-
accuracy_score
-
confusion_matrix
-
classification_report
-
-
Optimize k. Create a loop to test different k values (e.g., from 1 to 20).
-
Save the results in a list.
-
Plot accuracy vs k to find the best value.
-
Create a function that takes numerical values and predicts the quality:
predict_wine_quality([7.4, 0.7, 0.0, 1.9, 0.076, 11.0, 34.0, 0.9978, 3.51, 0.56, 9.4])
>>> "This wine is likely of medium quality 🍷"
Note: We also provide solution samples in
./solution.ipynb
, which we honestly suggest you only use if you're stuck for more than 30 minutes or if you've already finished and want to compare it with your approach.
You worked with a real dataset from the UCI Machine Learning Repository, applied supervised classification models, analyzed chemical features, and developed a function that simulates a sommelier's judgment using AI. That deserves to be shared!
Share an insightful phrase that demonstrates how AI can classify wine quality based on its composition. Add a precision vs. k plot (very visual) or a fun prediction using predict_wine_quality()
.
"Can artificial intelligence predict the quality of wine? 🍷 I trained a KNN model with real data from the UCI ML Repo and achieved 73% accuracy in classifying wines as low, medium, or high quality using only their chemical composition. The data doesn't lie: alcohol and sulfate are more revealing than a label! 😉 #MachineLearning #DataScience #WineLovers #AI #scikitLearn"
Once you have finished solving the case study, make sure to commit your changes, push to your repository, and go to 4Geeks.com to submit the repository link.