PCA vs Autoencoders: Understanding Hidden Data Structures
This repository contains the code and materials for the paper "Hidden Dimensions of the Data: PCA vs Autoencoders" published in Quality Engineering. It explores the connections between Principal Component Analysis (PCA) and linear autoencoders, providing insights into dimensionality reduction and latent feature extraction.
Both PCA and autoencoders aim to find a lower-dimensional representation of high-dimensional data, but they achieve this through different mechanisms. This repository provides:
- A simple PCA implementation and its comparison with autoencoders.
- A regularized autoencoder to explore feature extraction under constraints.
- A set of scripts and a Jupyter notebook to generate and visualize results.
pca.py- Implements PCA-based encoding.autoencoder.py- Defines a simple linear autoencoder.autoencoder_regularized.py- Implements a regularized version of the autoencoder.comparison.ipynb- Jupyter notebook to compare PCA and autoencoder results.comparison_simulations.py- Runs simulations to visualize PCA and autoencoder encodings.comparison_regularized.py- Evaluates the impact of regularization on autoencoders.
Below is an example visualization comparing PCA with autoencoder encodings:

Ensure you have Python installed with the following dependencies:
pip install numpy pandas torch scikit-learn matplotlib tqdmYou can test the PCA and autoencoder encoding methods by running:
python comparison_simulations.pyor interactively exploring:
jupyter notebook comparison.ipynb- PCA and linear autoencoders can yield similar feature transformations, but the training mechanisms differ.
- Regularization in autoencoders introduces additional constraints that impact feature extraction.
- The study provides insights into when each method is preferable for dimensionality reduction tasks.
If you use this work in your research, please cite:
Cacciarelli, D. Hidden Dimensions of the Data: PCA vs Autoencoders. Quality Engineering, 2024.