This project explores how Principal Component Analysis (PCA) and K-Means clustering perform on the Flags dataset from the UCI Machine Learning Repository. The study highlights how datasets with mixed feature types influence interpretability and model performance.
flags_pca_clustering.ipynb— Main Jupyter notebook containing code and explanations.flags_pca_clustering.html— Exported HTML version of the notebook.flags_pca_clustering.pdf— Exported PDF version of the notebook.figures/— Contains exported plots mainly for reference; all key figures are already embedded in the outputs.
- Perform PCA and K-Means clustering on the Flags dataset.
- Conduct exploratory data analysis to visualize trends and correlations.
- Assess PCA decomposition onto first two principal components.
- Analyze clustering results and discuss the tradeoff between parsimony and interpretability.
- Python (3.10.16 recommended)
- Jupyter Notebook / Jupyter Lab
- Python packages:
pandas,numpy,matplotlib,seaborn,altair,scikit-learn,ucimlrepo
You can install the required packages using:
pip install pandas numpy matplotlib seaborn altair scikit-learn ucimlrepo- Clone or download this repository.
- Open
flags_pca_clustering.ipynbin Jupyter Notebook or Jupyter Lab and load the dataset via theucimlrepopackage. - Run all cells to reproduce results, figures, and exported HTML/PDF outputs.