Skip to content

This is a comprehensive collection of Python implementations covering fundamental machine learning algorithms, data preprocessing techniques, model evaluation methods, and practical applications

Notifications You must be signed in to change notification settings

mixro/machine-learning-works

Repository files navigation

Machine Learning Works

Python Scikit-learn Matplotlib NumPy

The Machine Learning Works Repository is a comprehensive collection of Python implementations covering fundamental machine learning algorithms, data preprocessing techniques, model evaluation methods, and practical applications. It provides educational code examples for both supervised and unsupervised learning using popular Python libraries like scikit-learn, matplotlib, and numpy.

Table of Contents

Overview

This repository serves as an educational resource for machine learning concepts and implementations. It includes:

  • Supervised learning algorithms (classification and regression)
  • Unsupervised learning techniques (clustering and dimensionality reduction)
  • Data preprocessing and feature engineering methods
  • Model evaluation and hyperparameter tuning techniques
  • Machine learning pipelines and workflow automation

Repository Structure

chains-and-algorithm/
│   make-pipe.py
│   pipeline-1.py
│   pipeline-gridSearch.py
data_representation/
│   one-hot-encoding.py
model-evaluation/
│   GroupKFold.py
│   cross-validation-works.py
│   grid-search.py
│   k-fold-evaluation.py
models/
│   classifying_iris.py
supervised_learning/
│   SVM-works.py
│   decision_trees.py
│   kNN-classifier-works.py
│   kNN-regressor-works.py
│   linear_models_works.py
│   logistic-regression-works.py
│   neural-networks.py
unsupervised_learning/
│   Agglomerative-works.py
│   DBSCAN-works.py
│   NMF-works.py
│   PCA-works.py
│   comparison.py
│   feature_extraction.py
│   k-means-works.py
│   scaling-works.py
│   testing_file.py

Technologies

  • Python: 3.8+
  • Scikit-learn: 1.2+ for machine learning algorithms
  • Matplotlib: 3.5+ for data visualization
  • NumPy: 1.22+ for numerical computations
  • Pandas: For data manipulation and analysis
  • mglearn: For educational visualization utilities

Installation and Setup

Prerequisites

  • Python: 3.8 or higher (verify with python --version)
  • pip: Python package installer
  • Git: For cloning the repository

Installation

  1. Clone Repository:

    git clone https://github.com/mixro/machine-learning-works.git
    cd machine-learning-works
  2. Create Virtual Environment (recommended):

    python -m venv ml-env
    source ml-env/bin/activate  # On Windows: ml-env\Scripts\activate
  3. Install Dependencies:

    pip install -r requirements.txt

    If requirements.txt is not available, install core packages:

    pip install scikit-learn matplotlib numpy pandas mglearn jupyter

Module Documentation

1. Chains and Algorithm

  • make-pipe.py

    • Purpose: Demonstrates creating custom pipelines for machine learning workflows.
    • Key Features: Custom transformer implementation, building preprocessing and modeling pipelines, integration with scikit-learn pipeline API
    • Example Imports:
      from sklearn.pipeline import Pipeline
      from sklearn.preprocessing import StandardScaler
      from sklearn.base import BaseEstimator, TransformerMixin
  • pipeline-1.py

    • Purpose: Basic pipeline implementation for data preprocessing and modeling.
    • Key Features: Sequential data transformation, combined preprocessing and model training, streamlined machine learning workflows
  • pipeline-gridSearch.py

    • Purpose: Integrates pipelines with grid search for hyperparameter tuning.
    • Key Features: Hyperparameter tuning across pipeline steps, optimization of preprocessing parameters alongside model parameters, efficient parameter search space definition

2. Data Representation

  • one-hot-encoding.py
    • Purpose: Demonstrates one-hot encoding for categorical variables.
    • Key Features: Conversion of categorical data to numerical format, handling of nominal variables in machine learning, comparison with other encoding techniques
    • Example Imports:
      from sklearn.preprocessing import OneHotEncoder
      import pandas as pd
      import numpy as np

3. Model Evaluation

  • GroupKFold.py

    • Purpose: Implements Group K-Fold cross-validation for data with groups.
    • Key Features: Cross-validation that ensures same group not in both testing and training, useful for subject-specific or group-specific data
    • Example Imports:
      from sklearn.model_selection import GroupKFold
      import numpy as np
  • cross-validation-works.py

    • Purpose: Demonstrates various cross-validation techniques.
    • Key Features: Implementation of k-fold, stratified k-fold, and leave-one-out cross-validation, comparison of different validation strategies
  • grid-search.py

    • Purpose: Implements hyperparameter tuning using grid search.
    • Key Features: Exhaustive search over specified parameter values, model selection based on cross-validation performance, visualization of parameter performance
  • k-fold-evaluation.py

    • Purpose: Focused implementation and evaluation of K-Fold cross-validation.
    • Key Features: Detailed analysis of k-fold performance, impact of different k values on model evaluation, bias-variance tradeoff analysis

4. Models

  • classifying_iris.py
    • Purpose: Implements classification algorithms on the Iris dataset.
    • Key Features: Loads and preprocesses the famous Iris dataset, implements multiple classification algorithms, evaluates model performance metrics
    • Example Imports:
      from sklearn.datasets import load_iris
      from sklearn.model_selection import train_test_split
      from sklearn.ensemble import RandomForestClassifier
      from sklearn.metrics import accuracy_score

5. Supervised Learning

  • SVM-works.py

    • Purpose: Demonstrates Support Vector Machines for classification and regression.
    • Key Features: Linear and nonlinear SVM implementation, kernel trick visualization (RBF, polynomial), parameter tuning for C and gamma parameters
    • Example Imports:
      from sklearn.svm import SVC, SVR
      import matplotlib.pyplot as plt
      import numpy as np
  • decision_trees.py

    • Purpose: Implements decision trees for classification and regression.
    • Key Features: Tree-based model construction, visualization of decision boundaries, pruning and complexity parameter tuning
  • kNN-classifier-works.py

    • Purpose: Demonstrates k-Nearest Neighbors for classification tasks.
    • Key Features: Distance-based classification, impact of k parameter on model performance, feature scaling importance for kNN
  • kNN-regressor-works.py

    • Purpose: Implements k-Nearest Neighbors for regression tasks.
    • Key Features: Instance-based regression, distance-weighted predictions, comparison with other regression techniques
  • linear_models_works.py

    • Purpose: Demonstrates linear models for regression and classification.
    • Key Features: Linear and logistic regression implementation, regularization techniques (Ridge, Lasso, ElasticNet), coefficient analysis and interpretation
  • logistic-regression-works.py

    • Purpose: Focused implementation of logistic regression for classification.
    • Key Features: Binary and multiclass classification, probability calibration and threshold tuning, regularization path analysis
  • neural-networks.py

    • Purpose: Implements neural networks using scikit-learn's MLP classifier/regressor.
    • Key Features: Feedforward neural network implementation, hidden layer architecture experimentation, activation function and solver comparison

6. Unsupervised Learning

  • Agglomerative-works.py

    • Purpose: Demonstrates hierarchical clustering using Agglomerative Clustering.
    • Key Features: Implementation of Agglomerative Clustering algorithm, visualization of dendrograms and clustering results, comparison of different linkage methods
    • Example Imports:
      from sklearn.cluster import AgglomerativeClustering
      import matplotlib.pyplot as plt
      import numpy as np
      import mglearn
      from sklearn.datasets import make_blobs
  • DBSCAN-works.py

    • Purpose: Implements Density-Based Spatial Clustering of Applications with Noise (DBSCAN).
    • Key Features: Demonstration of density-based clustering, handling of noisy data and irregular cluster shapes, parameter tuning for epsilon and minimum samples
  • NMF-works.py

    • Purpose: Implements Non-Negative Matrix Factorization for feature extraction and dimensionality reduction.
    • Key Features: Matrix factorization for pattern recognition, applications in topic modeling and feature extraction, comparison with other dimensionality reduction techniques
  • PCA-works.py

    • Purpose: Demonstrates Principal Component Analysis for dimensionality reduction.
    • Key Features: Implementation of PCA for feature reduction, visualization of explained variance ratio, applications in data compression and visualization
  • comparison.py

    • Purpose: Compares different unsupervised learning algorithms.
    • Key Features: Side-by-side comparison of clustering algorithms, performance metrics for unsupervised learning, visualization of different techniques on common datasets
  • feature_extraction.py

    • Purpose: Demonstrates various feature extraction techniques.
    • Key Features: Text feature extraction (CountVectorizer, TF-IDF), image feature extraction, custom feature creation methods
  • k-means-works.py

    • Purpose: Implements k-Means clustering algorithm.
    • Key Features: Partition-based clustering, elbow method for determining optimal k, cluster initialization strategies
  • scaling-works.py

    • Purpose: Demonstrates data scaling techniques for machine learning.
    • Key Features: Standardization and normalization methods, impact of scaling on different algorithms, robust scaling for data with outliers
  • testing_file.py

    • Purpose: Serves as a sandbox for testing new ideas and algorithms.
    • Key Features: Experimental code development, algorithm prototyping, quick testing of concepts

Usage Examples

  • Running a supervised learning algorithm:

    python supervised_learning/SVM-works.py
  • Performing unsupervised clustering:

    python unsupervised_learning/k-means-works.py
  • Testing a complete pipeline:

    python chains-and-algorithm/pipeline-gridSearch.py
  • Evaluating model performance:

    python model-evaluation/k-fold-evaluation.py
  • Using Jupyter Notebook for exploration:

    jupyter notebook

Testing

The repository includes various test files to verify the functionality of different modules:

# Run specific test files
python -m pytest model-evaluation/ -v

# Run all test files in the repository
python -m pytest

Contributing Guidelines

  • Fork the repository
  • Create a feature branch (git checkout -b feature/amazing-feature)
  • Add tests for new functionality
  • Commit your changes (git commit -m 'Add amazing feature')
  • Push to the branch (git push origin feature/amazing-feature)
  • Open a Pull Request

Code Style

Please adhere to PEP 8 guidelines and include docstrings for all functions and classes following the Google Python Style Guide.

References

Support

For assistance, please refer to the repository issues page or contact the maintainers with detailed error logs and context.

About

This is a comprehensive collection of Python implementations covering fundamental machine learning algorithms, data preprocessing techniques, model evaluation methods, and practical applications

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages