This project, part of SMU's MSDS program for Doing Data Science, aims to classify beers using the K-Nearest Neighbors (KNN) algorithm. The primary objectives include performing exploratory data analysis, data visualization, and developing a KNN model with hyperparameter tuning to accurately categorize different beer types.
- Data Files:
Beers.csv
– Contains information about various beers, including attributes like name, style, ABV (alcohol by volume), and IBU (International Bitterness Units).Breweries.csv
– Provides details about breweries, such as names and locations.
- Analysis Scripts:
BeerInitialAnalysis.R
– R script for initial data exploration and cleaning.Micro Brewery Analysis.Rmd
– R Markdown file detailing the comprehensive analysis, including data preprocessing, visualization, and KNN model development.
- Reports and Presentations:
Micro_Brewery_Analysis.html
– HTML output of the R Markdown analysis, offering an interactive view of the findings.Budweiser C level presentation.pptx
– PowerPoint presentation tailored for executive-level stakeholders, summarizing key insights and recommendations.Micro Brewery Exploratory Data Analysis Jeff Nguyen Adam Ruthford.pptx
– Detailed presentation covering exploratory data analysis and visualization techniques.
- Exploratory Data Analysis (EDA): In-depth examination of the datasets to uncover patterns, detect anomalies, and understand relationships between variables.
- Data Visualization: Utilization of plots and charts to graphically represent data distributions and correlations, enhancing interpretability.
- KNN Model Development:
- Data Preprocessing: Handling missing values, normalizing features, and encoding categorical variables to prepare data for modeling.
- Hyperparameter Tuning: Optimizing the number of neighbors (k) and distance metrics to improve model accuracy.
- Model Evaluation: Assessing performance using metrics such as accuracy, precision, recall, and F1-score.
📺 Watch the project presentation on YouTube:
YouTube Deck Presentation – KNN Beer Classification
- Clone the Repository:
git clone https://github.com/7446Nguyen/KNN-Classification.git cd KNN-Classification
- Set Up Environment:
- Ensure R and RStudio are installed on your system.
- Install the necessary R packages by running the following command in your R console:
install.packages(c("tidyverse", "class", "caret", "ggplot2"))
- Run Analysis:
- Open
Micro Brewery Analysis.Rmd
in RStudio. - Knit the document to produce the HTML report or run code chunks interactively to explore the analysis step by step.
- Open
- Beers Dataset: Information on various beers, including their characteristics and styles.
- Breweries Dataset: Details about breweries, such as their names and geographic locations.
These datasets are integral to the analysis and are included in the repository for ease of access.
- Jeff Nguyen
- Adam Ruthford
We extend our gratitude to the faculty of the SMU MSDS program for their guidance and support. Special thanks to our peers for their valuable feedback and collaboration throughout this project.