Skip to content

This project is part of SMU's MSDS program for Doing Data Science. The main intent of this project is to demonstrate: exploratory data analysis, data visualization, and KNN model building and hyperparameter tunning.

Notifications You must be signed in to change notification settings

7446Nguyen/KNN-Classification

Repository files navigation

Classifying Beers with Their Nearest Neighbors – A KNN Exercise

Overview

This project, part of SMU's MSDS program for Doing Data Science, aims to classify beers using the K-Nearest Neighbors (KNN) algorithm. The primary objectives include performing exploratory data analysis, data visualization, and developing a KNN model with hyperparameter tuning to accurately categorize different beer types.

Repository Structure

  • Data Files:
    • Beers.csv – Contains information about various beers, including attributes like name, style, ABV (alcohol by volume), and IBU (International Bitterness Units).
    • Breweries.csv – Provides details about breweries, such as names and locations.
  • Analysis Scripts:
    • BeerInitialAnalysis.R – R script for initial data exploration and cleaning.
    • Micro Brewery Analysis.Rmd – R Markdown file detailing the comprehensive analysis, including data preprocessing, visualization, and KNN model development.
  • Reports and Presentations:
    • Micro_Brewery_Analysis.html – HTML output of the R Markdown analysis, offering an interactive view of the findings.
    • Budweiser C level presentation.pptx – PowerPoint presentation tailored for executive-level stakeholders, summarizing key insights and recommendations.
    • Micro Brewery Exploratory Data Analysis Jeff Nguyen Adam Ruthford.pptx – Detailed presentation covering exploratory data analysis and visualization techniques.

Key Features

  • Exploratory Data Analysis (EDA): In-depth examination of the datasets to uncover patterns, detect anomalies, and understand relationships between variables.
  • Data Visualization: Utilization of plots and charts to graphically represent data distributions and correlations, enhancing interpretability.
  • KNN Model Development:
    • Data Preprocessing: Handling missing values, normalizing features, and encoding categorical variables to prepare data for modeling.
    • Hyperparameter Tuning: Optimizing the number of neighbors (k) and distance metrics to improve model accuracy.
    • Model Evaluation: Assessing performance using metrics such as accuracy, precision, recall, and F1-score.

YouTube Presentation

📺 Watch the project presentation on YouTube:
YouTube Presentation

YouTube Deck Presentation – KNN Beer Classification

Getting Started

  1. Clone the Repository:
    git clone https://github.com/7446Nguyen/KNN-Classification.git
    cd KNN-Classification
  2. Set Up Environment:
    • Ensure R and RStudio are installed on your system.
    • Install the necessary R packages by running the following command in your R console:
      install.packages(c("tidyverse", "class", "caret", "ggplot2"))
  3. Run Analysis:
    • Open Micro Brewery Analysis.Rmd in RStudio.
    • Knit the document to produce the HTML report or run code chunks interactively to explore the analysis step by step.

Data Sources

  • Beers Dataset: Information on various beers, including their characteristics and styles.
  • Breweries Dataset: Details about breweries, such as their names and geographic locations.

These datasets are integral to the analysis and are included in the repository for ease of access.

Authors

  • Jeff Nguyen
  • Adam Ruthford

Acknowledgments

We extend our gratitude to the faculty of the SMU MSDS program for their guidance and support. Special thanks to our peers for their valuable feedback and collaboration throughout this project.

About

This project is part of SMU's MSDS program for Doing Data Science. The main intent of this project is to demonstrate: exploratory data analysis, data visualization, and KNN model building and hyperparameter tunning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •