GitHub - dcacciarelli/pca-vs-autoencoders: This repo contains the code for replicating the experiments in the paper "Hidden dimensions of the data: PCA vs autoencoders".

PCA vs Autoencoders: Understanding Hidden Data Structures

This repository contains the code and materials for the paper "Hidden Dimensions of the Data: PCA vs Autoencoders" published in Quality Engineering. It explores the connections between Principal Component Analysis (PCA) and linear autoencoders, providing insights into dimensionality reduction and latent feature extraction.

Overview

Both PCA and autoencoders aim to find a lower-dimensional representation of high-dimensional data, but they achieve this through different mechanisms. This repository provides:

A simple PCA implementation and its comparison with autoencoders.
A regularized autoencoder to explore feature extraction under constraints.
A set of scripts and a Jupyter notebook to generate and visualize results.

Repository Structure

Codebase

pca.py - Implements PCA-based encoding.
autoencoder.py - Defines a simple linear autoencoder.
autoencoder_regularized.py - Implements a regularized version of the autoencoder.
comparison.ipynb - Jupyter notebook to compare PCA and autoencoder results.
comparison_simulations.py - Runs simulations to visualize PCA and autoencoder encodings.
comparison_regularized.py - Evaluates the impact of regularization on autoencoders.

Example Comparison

Below is an example visualization comparing PCA with autoencoder encodings:

How to Use

Requirements

Ensure you have Python installed with the following dependencies:

pip install numpy pandas torch scikit-learn matplotlib tqdm

Running Experiments

You can test the PCA and autoencoder encoding methods by running:

python comparison_simulations.py

or interactively exploring:

jupyter notebook comparison.ipynb

Results & Key Findings

PCA and linear autoencoders can yield similar feature transformations, but the training mechanisms differ.
Regularization in autoencoders introduces additional constraints that impact feature extraction.
The study provides insights into when each method is preferable for dimensionality reduction tasks.

Citation

If you use this work in your research, please cite:

Cacciarelli, D. Hidden Dimensions of the Data: PCA vs Autoencoders. Quality Engineering, 2024.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
LICENSE		LICENSE
README.md		README.md
autoencoder.py		autoencoder.py
autoencoder_regularized.py		autoencoder_regularized.py
comparison.ipynb		comparison.ipynb
pca.py		pca.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PCA vs Autoencoders: Understanding Hidden Data Structures

Overview

Repository Structure

Codebase

Example Comparison

How to Use

Requirements

Running Experiments

Results & Key Findings

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

dcacciarelli/pca-vs-autoencoders

Folders and files

Latest commit

History

Repository files navigation

PCA vs Autoencoders: Understanding Hidden Data Structures

Overview

Repository Structure

Codebase

Example Comparison

How to Use

Requirements

Running Experiments

Results & Key Findings

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages