-
Notifications
You must be signed in to change notification settings - Fork 1
Eigedecomposition Techniques
With the increasing amount of research and data/data types needed for it, we need techniques to decrease the dimension of complex data. Such practices are called Dimension Reduction Techniques and are divided into two main categories - feature selection and feature extractions. The difference between both is that feature selection only selects/excludes features, while feature extraction reconstructs them into lower dimensions. We will focus on feature extraction methods. Some of the algorithms for dimension reduction are PCA (Principal Component Analysis), Kernel PCA, SVD (Singular Value Decomposition), etc. Our work is to compare a few of the techniques and implement the most efficient ones within the Scikit4j Plug-in. The Plug-in is to be used in Neo4j because the targeted data of this project will come from Graph databases.
Dimensionality reduction techniques are simply methods to handle high-dimensional datasets by decreasing the dimensions. Feature extraction algorithms are with the purpose of dimensionality reduction and their aim is to improve complexity by finding a smaller subsets of the dataset, but meanwhile preserving the same information as the input variables.
In this survey one of the first step is to analyse and compare different dimensionality reduction techniques, so we can choose the most suitable and efficient one to be implemented as part of Neo4j scikit4j plug-in. Another task is to check if the majority of the techniques are using Eigenvalue decomposition.
| Learning | Type | Technique | Source | Dataset type | Required parameters | Training | Complexity | Loss of quality |
|---|---|---|---|---|---|---|---|---|
| Supervised | Non-linear | Neural Network | ||||||
| Linear | Orthogonal centroid algorithm | 2 | ||||||
| Maximum margin criterion | 1 | |||||||
| Linear Discriminant Analysis | 1 | |||||||
| Unsupervised | Principal Component Analysis | 2 | ||||||
| Independent Component Analysis | 2 | |||||||
| Single Value Decomposition | 2 | |||||||
| Non-linear | Kernel PCA | 2 | ||||||
| Isomap | ||||||||
| Laplacian Eigenmaps | ||||||||
| Locally Linear Embedding |
- Install SimKit Plugin in Neo4j
- Procedures
- Map nodes in Neo4j from CSV
- Construct similarity matrix in Neo4j from CSV
- Construct similarity matrix in Neo4j from Neo4j Nodes
- Construct laplacian eigendecomposited matrix
- Perform K-means clustering and validate it with silhouette coefficient
- Calculate silhouette coefficient
- Calculate adjusted rand index
- Perform spectral clustering algorithm and validate it with silhouette coefficient