Skip to content

imashiqe/Machine-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Machine Learning Foundations – Statistics, Probability & Data Preparation

This repository contains foundational learning materials to help you understand data before training machine learning models.
If we don’t understand data first, our models may become biased, unstable, or misleading β€” so statistics is the first step.


πŸ“Œ What is Statistics?

Statistics is the science of collecting, summarizing, analyzing, and interpreting data.

Example:
If we have students' exam scores:

  • Mean β†’ Average score
  • Median β†’ Middle score
  • Standard Deviation β†’ How spread out the scores are

Statistics helps us find meaning in data.


πŸ€– Why Statistics is Important for Machine Learning

Machine learning models learn patterns from data.
If the data has:

  • Outliers
  • Skewed distribution
  • Wrong scaling
  • Missing values

Then the model will learn wrong patterns.

Statistics helps us:

  • Understand center & spread
  • Detect outliers
  • Identify skewness and long tails
  • Scale features & encode categories
  • Evaluate model performance correctly

πŸ“š Course Modules

Module 01 β€” Descriptive Statistics & Distributions

Topic Purpose
Mean, Median, Mode Measure center of data
Variance & Standard Deviation Measure spread
Percentiles & Quartiles Understand rank within dataset
IQR (Interquartile Range) Outlier detection
Z-Score Standardization
Distribution Shapes Symmetric vs Skewed vs Long-tail

Use Median + IQR when data is skewed or contains outliers.
Use Mean + SD when data is symmetrical.


Module 02 β€” Probability Basics for ML

  • Events, outcomes, sample space
  • Conditional probability & independence
  • Bayes’ Theorem (foundation for Naive Bayes)
  • Sensitivity, specificity, false positives/negatives
  • Class imbalance problems

Module 2.5 β€” Practice Worksheets

  • Compute mean, median, SD, IQR, fences
  • Z-score & outlier detection (manual + Python)

Module 03 β€” Data Quality, Scaling & Encoding

Concept Why It Matters
Missing Data Types (MCAR/MAR/MNAR) Correct imputation
Min-Max, Standard & Robust Scaling Prevents unfair feature influence
One-Hot & Ordinal Encoding Proper handling of categorical data
Distance Metrics Used in KNN, Clustering, Embeddings
Covariance & Correlation Feature relationship understanding
PCA (concept intro) Dimensionality reduction

Module 3.5 β€” Hands-on Worksheets

  • Bayes rule problems
  • Confusion matrix: Precision, Recall, F1-Score

Module 04 β€” Quiz / Review

Conceptual wrap-up before applying ML algorithms.


🎯 Week 1 Goal

Build intuition, not memorize formulas.

We learn to:

  • Summarize data
  • Detect outliers
  • Understand real-world distributions
  • Prepare data for ML models to be accurate, robust & explainable

πŸ“‚ Folder Structure

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published