Skip to content

This repo contains the code for replicating the experiments in the paper "Hidden dimensions of the data: PCA vs autoencoders".

License

Notifications You must be signed in to change notification settings

dcacciarelli/pca-vs-autoencoders

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PCA vs Autoencoders: Understanding Hidden Data Structures

This repository contains the code and materials for the paper "Hidden Dimensions of the Data: PCA vs Autoencoders" published in Quality Engineering. It explores the connections between Principal Component Analysis (PCA) and linear autoencoders, providing insights into dimensionality reduction and latent feature extraction.


Overview

Both PCA and autoencoders aim to find a lower-dimensional representation of high-dimensional data, but they achieve this through different mechanisms. This repository provides:

  • A simple PCA implementation and its comparison with autoencoders.
  • A regularized autoencoder to explore feature extraction under constraints.
  • A set of scripts and a Jupyter notebook to generate and visualize results.

Repository Structure

Codebase

  • pca.py - Implements PCA-based encoding.
  • autoencoder.py - Defines a simple linear autoencoder.
  • autoencoder_regularized.py - Implements a regularized version of the autoencoder.
  • comparison.ipynb - Jupyter notebook to compare PCA and autoencoder results.
  • comparison_simulations.py - Runs simulations to visualize PCA and autoencoder encodings.
  • comparison_regularized.py - Evaluates the impact of regularization on autoencoders.

Example Comparison

Below is an example visualization comparing PCA with autoencoder encodings: PCA vs Autoencoders


How to Use

Requirements

Ensure you have Python installed with the following dependencies:

pip install numpy pandas torch scikit-learn matplotlib tqdm

Running Experiments

You can test the PCA and autoencoder encoding methods by running:

python comparison_simulations.py

or interactively exploring:

jupyter notebook comparison.ipynb

Results & Key Findings

  • PCA and linear autoencoders can yield similar feature transformations, but the training mechanisms differ.
  • Regularization in autoencoders introduces additional constraints that impact feature extraction.
  • The study provides insights into when each method is preferable for dimensionality reduction tasks.

Citation

If you use this work in your research, please cite:

Cacciarelli, D. Hidden Dimensions of the Data: PCA vs Autoencoders. Quality Engineering, 2024.

About

This repo contains the code for replicating the experiments in the paper "Hidden dimensions of the data: PCA vs autoencoders".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published