The machine learning process involves the following steps:
- 1- Data Preparation: Collect, clean, and preprocess data.
- 2- Data Visualization and Analysis: Visualize and analyze data to identify patterns and relationships.
- 3- Feature Engineering: Select and transform relevant variables in the data.
- 4- Model Selection: Choose the best model for the problem.
- 5- Model Training: Feed data into the model and adjust parameters to minimize error.
- 6- Hyperparameter Tuning: Set hyperparameters to optimize model performance.
- 7- Model Evaluation: Measure accuracy, precision, recall, and other performance metrics.
- 8- Model Deployment: Integrate the model into an application and set up a pipeline to feed new data.
This tutorial covers Machine Learning Basics using Python
.
The repository includes Python notebooks, reference guides, and cheatsheets for the entire Machine Learning process:
- 1- Data preprocessing and analysis: clean and transform data into a format suitable for analysis using
NumPy
andPandas
. - 2- Data visualization: understand and explore data visually using
Matplotlib
andSeaborn
. - 3- Machine learning: explore various algorithms in
Scikit-learn
such as regression, classification, and clustering. - 4- Feature engineering: feature encoding, feature scaling, feature selection, etc.
- 5- Model selection: comparison of ML algorithms, how to choose a ML algorithm, etc.
- 6- Hyperparameters tuning: Grid Search, Random Search, and Bayesian Optimization.
- 7- Model evaluation: validation methods, evaluation metrics, etc.
- 8- Model explainability: feature importance, interpretable models, etc.
The repository also includes two Python notebooks of two popular examples to get started with Machine Learning:
- Classification - Titanic Survival Prediction: Predict whether a passenger on the Titanic ship survived or not based on various features such as their age, gender, ticket class, and cabin location (notebook).
- Regression - Boston House Price Prediction: Predict the median value of houses in Boston neighborhoods based on various features such as crime rate, number of rooms, proximity to employment centers, and accessibility to highways (notebook).
The end of the GitHub repository provides resources and links to practice and advance with Machine Learning:
- The most popular ML dataset platforms.
- The most popular ML competition platforms.
- A guide to tackle ML competitions (PDF).
Tools:
- Python 3
- Jupyter Notebook
- Google Colab
Concepts:
- 1- Machine learning basic concepts
- 2- Read input data in
Python
- 3- Data preprocessing and analysis:
Numpy
andPandas
- 4- Data visualization:
Matplotlib
andSeaborn
- 5- Machine learning:
Scikit-learn
- 6- Feature engineering
- 7- Model selection and parameter tuning
- 8- Model evaluation and explainability
- 9- Practice: Machine learning datasets
- 10- Practice: Machine learning competitions
1- Machine learning basic concepts
- Presentation on Machine learning basic concepts (PDF)
2- Read input data in Python
- Tutorial to read various sources in a DataFrame (notebook)
3- Data preprocessing and analysis: Numpy
and Pandas
4- Data visualization: Matplotlib
and Seaborn
5- Machine learning: Scikit-learn
- Machine learning map (PDF)
- Scikit-learn cheatsheet (PDF)
- Scikit-learn tutorial (notebook)
- Classification: Titanic Survival Prediction (notebook)
- Regression: Boston House Price Prediction (notebook)
6- Feature engineering
- Feature engineering cheatsheet (PDF)
- Feature engineering tutorial (notebook)
- Feature selection methods (IMG)
7- Model selection and parameter tuning
- Comparison of ML algorithms 1 (PDF)
- Comparison of ML algorithms 2 (IMG)
- How to choose a ML algorithm (IMG)
- Hyperparameter tuning (WEB)
8- Model evaluation and explainability
- Evaluation metrics cheatsheet (PDF)
- Evaluation metrics in Python (WEB)
- Model explainability cheatsheet (PDF)
9- Practice: Machine learning datasets
- UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/index.php
- Kaggle datasets: https://www.kaggle.com/datasets
- Awesome Public Datasets: https://github.com/awesomedata/awesome-public-datasets
- Google Dataset Search: https://datasetsearch.research.google.com/
- OpenML Datasets: https://www.openml.org/
- Papers With Code: https://paperswithcode.com/datasets
10- Practice: Machine learning competitions
- Kaggle: https://www.kaggle.com/competitions
- DrivenData: https://www.drivendata.org
- Zindi Africa: https://zindi.africa/competitions
- Guide to tackle ML competitions (PDF)