This project is designed for end-to-end multi-class classification and statistical visualization of tabular data. It separates application-specific configurations from generalized quantitative tools, making it highly reusable for various analytical tasks, such as those encountered in data science and quantitative research.
Modular Architecture : Clear separation between logic (ml_toolkit.py), configuration (pet_config.py, style_config.py), and execution (main.py).
Multi-Class ML : Performs per-feature and multi-feature classification using LDA, Logistic Regression, and other models, validated via Leave-One-Out Cross-Validation (LOOCV).
Dimensionality Reduction : Automated generation of PCA and t-SNE plots for high-dimensional feature spaces, helping to visualize class separation.
Statistical Visualization : Generates publication-quality Violin Plots with Mann-Whitney U statistical annotations for pairwise group comparisons.
| File | Role | Description |
|---|---|---|
| main.py | Execution Engine | Loads configuration and orchestrates the entire pipeline. |
| toolkit.py | Quantitative Toolkit | Contains all reusable functions for ML (AUC, LOOCV) and plotting. This file is entirely data-agnostic. |
| style.py | Styling | Defines all visual parameters (fonts, colors, figure sizes) for consistent output. |
- Configure Data: Update the DATA_PATH and the ANALYSIS_CONFIGS (specifically the grouping_map and tumour_groups) within pet_config.py to match your input data and desired analytical comparisons.
- Adjust Parameters: Modify settings in GLOBAL_CONFIG['plotting_parameters'] to control feature selection (e.g., max_num_features), cross-validation, and plot generation.
- Run Pipeline: Execute the main script: python main.py
Results (plots, CSVs, and model performance metrics) will be saved in the configured Results directory.