Skip to content
This repository has been archived by the owner on Apr 29, 2024. It is now read-only.

Commit

Permalink
feat: add lab2 programs on PCA
Browse files Browse the repository at this point in the history
Signed-off-by: GitHub <noreply@github.com>
  • Loading branch information
kbdharun authored Jan 18, 2024
1 parent 84cf3ab commit 6dada14
Show file tree
Hide file tree
Showing 8 changed files with 4,600 additions and 3 deletions.
912 changes: 912 additions & 0 deletions Lab2/PCA-DR-Wine.ipynb

Large diffs are not rendered by default.

1,171 changes: 1,171 additions & 0 deletions Lab2/PCA-Wine-quality-classification.ipynb

Large diffs are not rendered by default.

368 changes: 368 additions & 0 deletions Lab2/PCA-using-alg-without-sk.ipynb

Large diffs are not rendered by default.

486 changes: 486 additions & 0 deletions Lab2/PCA-using-sklearn-Iris.ipynb

Large diffs are not rendered by default.

52 changes: 52 additions & 0 deletions Lab2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Principal Component Analysis

- Principal Component Analysis is a
popular unsupervised learning technique
for reducing the dimensionality of large
data sets.
- It increases interpretability yet,
at the same time, it minimizes information
loss. It helps to find the most significant
features in a dataset and makes the data
easy for plotting in 2D and 3D. PCA helps
in finding a sequence of linear combinations
of variables.

## Terms

- **Principal Component:** They are a straight line that captures most
of the variance of the data. They have a direction and
magnitude. Principal components are orthogonal projections
(perpendicular) of data onto lower-dimensional space.

- **Dimensionality:** Quantity of features or variables
used in the research.

## Steps

1. **Standardize the data**: PCA requires standardized data, so the first step is to standardize the
data to ensure that all variables have a mean of 0 and a standard deviation of 1.

2. **Calculate the covariance matrix**: The next step is to calculate the covariance matrix of the
standardized data. This matrix shows how each variable is related to every other variable in
the dataset.

3. **Calculate the eigenvectors and eigenvalues**: The eigenvectors and eigenvalues of the
covariance matrix are then calculated. The eigenvectors represent the directions in which
the data varies the most, while the eigenvalues represent the amount of variation along
each eigenvector.

4. **Choose the principal components**: The principal components are the eigenvectors with the
highest eigenvalues. These components represent the directions in which the data varies
the most and are used to transform the original data into a lower-dimensional space.

5. **Transform the data**: The final step is to transform the original data into the
lower-dimensional space defined by the principal components.

## Applications

- Used to visualize multidimensional data.
- Used to reduce the number of dimensions in healthcare data.
- Can help resize an image.
- Used in finance to analyze stock data and forecast returns.
- Helps to find patterns in the high-dimensional datasets.
1,600 changes: 1,600 additions & 0 deletions Lab2/winequality-red.csv

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,12 @@ This repository contains the programs that I worked out in Machine Learning Labo
- [Introduction to Numpy and Pandas package](https://github.com/kbdharun/ML-Lab/blob/main/Lab1/Numpy_&_Pandas.ipynb)
- [Insurance Data Analysis](https://github.com/kbdharun/ML-Lab/blob/main/Lab1/ML_Lab1_Insurance.ipynb)
- [Iris Data Analysis](https://github.com/kbdharun/ML-Lab/blob/main/Lab1/ML_Lab1_Iris.ipynb)
- Lab 2: Principal Component Analysis
- [About PCA](https://github.com/kbdharun/ML-Lab/blob/main/Lab2/README.md)
- [PCA based dimensionality reduction on Wine dataset](https://github.com/kbdharun/ML-Lab/blob/main/Lab2/PCA-DR-Wine.ipynb)
- [PCA using algorithm steps without `sklearn`](https://github.com/kbdharun/ML-Lab/blob/main/Lab2/PCA-using-alg-without-sk.ipynb)
- [PCA using `sklearn` on Iris dataset](https://github.com/kbdharun/ML-Lab/blob/main/Lab2/PCA-using-sklearn-Iris.ipynb)
- [PCA - Wine Quality Classification](https://github.com/kbdharun/ML-Lab/blob/main/Lab2/PCA-Wine-quality-classification.ipynb)

## Prerequisites

Expand Down
8 changes: 5 additions & 3 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
pandas
numpy
matplotlib
notebook
numpy
pandas
plotly
seaborn
scikit-learn
notebook

0 comments on commit 6dada14

Please sign in to comment.