This repository contains the final project for DS 7333: Quantifying the World, focusing on cost-sensitive classification using Artificial Neural Networks (ANNs). The project aims to build a high-precision machine learning model that minimizes financial losses due to misclassification. The business requirement emphasizes that false positives are significantly more costly than false negatives, guiding the model selection, training, and evaluation.
- 📄 Optimizing Classification for Cost-Sensitive Decision Making A Deep Learning Approach.pdf – The final report detailing the dataset, preprocessing steps, model selection, architecture, training, evaluation, and results.
- The report outlines the financial impact of misclassification and how various ANN models were tuned to optimize precision and recall.
- 📁 Notebooks and Scripts – Python and R scripts for data preprocessing, exploratory data analysis (EDA), feature engineering, model training, and evaluation.
- Key components:
- Exploratory Data Analysis
-📊 View the full interactive EDA report here:
👉 EDA Report - Data Preparation:
- Handling missing values, categorical encoding, feature transformations.
- Different approaches to dataset modification (raw, imputed, and engineered features).
- Model Training & Evaluation:
- Logistic regression as a baseline.
- Deep learning models implemented using TensorFlow/Keras.
- ANN architectures with multiple layers, dropout layers, and hyperparameter tuning.
- Model evaluation metrics: precision, recall, and financial cost minimization.
- Exploratory Data Analysis
-📊 View the full interactive EDA report here:
- Multiple ANN models were tested with varying architectures.
- The best-performing model (Model 3) achieved the lowest monetary loss of $57,270 by optimizing the classification threshold to 67%.
- Cross-validation confirmed model generalization.
- 🐍 Python (TensorFlow, Keras, NumPy, Pandas, Scikit-Learn, Matplotlib)
- 📊 R (tidyverse, caret, ggcorrplot)
- 💻 Machine Learning & Deep Learning
- Logistic Regression
- Random Forest
- AdaBoost
- Naive Bayes
- KNN
- SVM
- Artificial Neural Networks (ANNs) with dropout layers
- Clone the repository:
git clone https://github.com/7446Nguyen/ANN_ML.git cd ANN_ML
- Install dependencies:
pip install -r requirements.txt
- Run the Python scripts for model training and evaluation.
- David Shaw
- Jeff Nguyen
- David Julovich
This project was completed as part of DS 7333: Quantifying the World at Southern Methodist University's MSDS Program. Special thanks to our instructors and peers for valuable feedback.