Skip to content

teddphil/MulticlassClassification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Generalized Multi-Class Classification and Visualization Pipeline

This project is designed for end-to-end multi-class classification and statistical visualization of tabular data. It separates application-specific configurations from generalized quantitative tools, making it highly reusable for various analytical tasks, such as those encountered in data science and quantitative research.

Features

Modular Architecture : Clear separation between logic (ml_toolkit.py), configuration (pet_config.py, style_config.py), and execution (main.py).

Multi-Class ML : Performs per-feature and multi-feature classification using LDA, Logistic Regression, and other models, validated via Leave-One-Out Cross-Validation (LOOCV).

Dimensionality Reduction : Automated generation of PCA and t-SNE plots for high-dimensional feature spaces, helping to visualize class separation.

Statistical Visualization : Generates publication-quality Violin Plots with Mann-Whitney U statistical annotations for pairwise group comparisons.

Structure

File Role Description
main.py Execution Engine Loads configuration and orchestrates the entire pipeline.
toolkit.py Quantitative Toolkit Contains all reusable functions for ML (AUC, LOOCV) and plotting. This file is entirely data-agnostic.
style.py Styling Defines all visual parameters (fonts, colors, figure sizes) for consistent output.

Usage

  1. Configure Data: Update the DATA_PATH and the ANALYSIS_CONFIGS (specifically the grouping_map and tumour_groups) within pet_config.py to match your input data and desired analytical comparisons.
  2. Adjust Parameters: Modify settings in GLOBAL_CONFIG['plotting_parameters'] to control feature selection (e.g., max_num_features), cross-validation, and plot generation.
  3. Run Pipeline: Execute the main script: python main.py

Results (plots, CSVs, and model performance metrics) will be saved in the configured Results directory.

About

High-dimensional and multi-class data classification with classic machine learning classifiers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages