Skip to content

Latest commit

 

History

History
13 lines (8 loc) · 995 Bytes

README.md

File metadata and controls

13 lines (8 loc) · 995 Bytes

Data Science demo notebooks and files

This is a collection of working Jupyter notebooks with associated datasets (mostly from Kaggle) to show EDA, data cleaning, model building, validation, grid search for hyperparameter optimization, feature importances, and plotting. The logistic regression and random forest classifier notebook was the capstone project for my Google Advanced Data Analytics course. It includes business recommendations at the end.

Models Used:

  • Naive Bayes classifier (naive-bayes-confusion-matrix.ipynb)
  • Linear Regression with hypothesis testing (linear-regression-anova-hypothesis-test.ipynb)
  • K-means unsupervised classifier with intertia and silhouette scoring (Kmeans-inertia-and-silhouette-score.ipynb)
  • Decision Tree classifier with grid search and feature importance plotting (decision_tree_grid_search_feature_importances.ipynb)
  • Logistic regression and Random Forest Classifier capstone project (capstone-logistic-random-forest-classifier.ipynb)