Statistics for Machine Learning

Overview

This repository began as a collection of notebooks covering statistics topics that are useful for understanding machine learning methods. Over time, this has evolved to cover topics from the very fundamentals of statistics, linear algebra and data science, to building many of the most common machine learning models in industry from scratch. The repository is split into chapters, each tackling a specific topic. Each chapter is made up of several notebooks which dive into the theory behind different areas of machine learning. These derive the relevent equations using KaTex and implement the methods programmatically using Python. The end of every notebook contains a Further Reading section which points to useful resources that can be used to explore each topic further.

Repository Highlights


k-Means Clustering algorithm written in Python, implementing k-Means++ intelligent centroid spacing.	Agglomerative Hierarchical Clustering algorithm written in Python, offering 4 different linkage methods.

Chapter 1 - Statistics Fundamentals

1.1 - Introduction to Statistics

1.2 - Basic Data Visualisation

1.3 - Probability & Baye's Theorem

1.4 - Probability Distributions & Expected Values

1.5 - Distributions in Data (Including Log Normal Distributions)

1.6 - Sampling Distributions & Estimators

1.7 - Confidence Intervals & t-Distributions

1.8 - Hypothesis Testing & p-Values

1.9 - Covariance and the Covariance Matrix

1.10 - Pearson's Correlation Coefficient and R Squared

Chapter 2 - Machine Learning Fundamentals

2.1 - Introduction to Machine Learning

2.2 - Bias vs Variance Trade-off

2.3 - Model Evaluation Metrics

2.4 - Machine Learning Pipelines

Chapter 3 - Supervised Learning: Regression

3.1 - Simple Linear Regression

3.2 - Multiple Regression

3.3 - Regression Trees

3.4 - Random Forests

Chapter 4 - Supervised Learning: Classification

4.1 - Logistic Regression

4.2 - k-Nearest Neighbor Classifier

4.3 - Naive Bayes

4.4 - Support Vector Machines

4.5 - Classification Trees

Chapter 5 - Unsupervised Learning

5.1 - K-Means Clustering

5.2 - Hierarchical Agglomerative Clustering

5.3 - Association Learning & Market Basket Analysis

5.4 - Principle Component Analysis

Chapter 6 - Neural Networks and Deep Learning

6.1 - Multi-Layer Perceptrons

Chapter 7 - Natural Language Processing

7.1 - Introduction to Large Language Models

7.2 - The Tokenization Pipeline

Future Work

Introduction to Statistics

Update reference to Sampling a Distribution & Bessel's Correction

Basic Data Visualisation

Add Venn diagrams and time series plots

Sampling a Distibution & Bessel's Correction

Describe coefficient of variation
A explanation for how to sample data, and what design decisions to make

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Statistics for Machine Learning

Overview

Repository Highlights

Chapter 1 - Statistics Fundamentals

Chapter 2 - Machine Learning Fundamentals

Chapter 3 - Supervised Learning: Regression

Chapter 4 - Supervised Learning: Classification

Chapter 5 - Unsupervised Learning

Chapter 6 - Neural Networks and Deep Learning

Chapter 7 - Natural Language Processing

Future Work

Introduction to Statistics

Basic Data Visualisation

Sampling a Distibution & Bessel's Correction

Further Reading

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 226 Commits
Chapter 1 - Statistics Fundamentals		Chapter 1 - Statistics Fundamentals
Chapter 2 - Machine Learning Fundamentals		Chapter 2 - Machine Learning Fundamentals
Chapter 3 - Supervised Learning: Regression		Chapter 3 - Supervised Learning: Regression
Chapter 4 - Supervised Learning: Classification		Chapter 4 - Supervised Learning: Classification
Chapter 5 - Unsupervised Learning		Chapter 5 - Unsupervised Learning
Chapter 6 - Neural Networks and Deep Learning		Chapter 6 - Neural Networks and Deep Learning
Chapter 7 - Natural Language Processing		Chapter 7 - Natural Language Processing
Notebook Template.ipynb		Notebook Template.ipynb
README.md		README.md

Alis-AI/Statistics-for-Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

Statistics for Machine Learning

Overview

Repository Highlights

Chapter 1 - Statistics Fundamentals

Chapter 2 - Machine Learning Fundamentals

Chapter 3 - Supervised Learning: Regression

Chapter 4 - Supervised Learning: Classification

Chapter 5 - Unsupervised Learning

Chapter 6 - Neural Networks and Deep Learning

Chapter 7 - Natural Language Processing

Future Work

Introduction to Statistics

Basic Data Visualisation

Sampling a Distibution & Bessel's Correction

Further Reading

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages