This repository has been archived by the owner on Apr 29, 2024. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: GitHub <noreply@github.com>
- Loading branch information
Showing
8 changed files
with
4,600 additions
and
3 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# Principal Component Analysis | ||
|
||
- Principal Component Analysis is a | ||
popular unsupervised learning technique | ||
for reducing the dimensionality of large | ||
data sets. | ||
- It increases interpretability yet, | ||
at the same time, it minimizes information | ||
loss. It helps to find the most significant | ||
features in a dataset and makes the data | ||
easy for plotting in 2D and 3D. PCA helps | ||
in finding a sequence of linear combinations | ||
of variables. | ||
|
||
## Terms | ||
|
||
- **Principal Component:** They are a straight line that captures most | ||
of the variance of the data. They have a direction and | ||
magnitude. Principal components are orthogonal projections | ||
(perpendicular) of data onto lower-dimensional space. | ||
|
||
- **Dimensionality:** Quantity of features or variables | ||
used in the research. | ||
|
||
## Steps | ||
|
||
1. **Standardize the data**: PCA requires standardized data, so the first step is to standardize the | ||
data to ensure that all variables have a mean of 0 and a standard deviation of 1. | ||
|
||
2. **Calculate the covariance matrix**: The next step is to calculate the covariance matrix of the | ||
standardized data. This matrix shows how each variable is related to every other variable in | ||
the dataset. | ||
|
||
3. **Calculate the eigenvectors and eigenvalues**: The eigenvectors and eigenvalues of the | ||
covariance matrix are then calculated. The eigenvectors represent the directions in which | ||
the data varies the most, while the eigenvalues represent the amount of variation along | ||
each eigenvector. | ||
|
||
4. **Choose the principal components**: The principal components are the eigenvectors with the | ||
highest eigenvalues. These components represent the directions in which the data varies | ||
the most and are used to transform the original data into a lower-dimensional space. | ||
|
||
5. **Transform the data**: The final step is to transform the original data into the | ||
lower-dimensional space defined by the principal components. | ||
|
||
## Applications | ||
|
||
- Used to visualize multidimensional data. | ||
- Used to reduce the number of dimensions in healthcare data. | ||
- Can help resize an image. | ||
- Used in finance to analyze stock data and forecast returns. | ||
- Helps to find patterns in the high-dimensional datasets. |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,8 @@ | ||
pandas | ||
numpy | ||
matplotlib | ||
notebook | ||
numpy | ||
pandas | ||
plotly | ||
seaborn | ||
scikit-learn | ||
notebook | ||
|