Analysis Report has been attached which contains:
-
Information of the Dataset
a. type/Count and Non Null information about the dataset
b. unique values in each Non Numeric column
c. HeatMap -
Data Preprocessing and Analysis
a. Handling if any missing values
b. Checking if the percentage values lie between 0 to 100
c. Label Encoding the Categorical/Non-Numeric Values
d. Feature Scaling the Numeric Values -
Combining the Output Labels
a. Manually Combining the labels
b. Combining Features using Cosine Similarity between labels -
Feature Selection
-
Model Evaluation/Experiments and Analysis
a. ANN Model on both types of Output Clubbing Techniques and Varying Test Sizes
i. Output Columns Clustering Technique 1 - Manually with Varying Test Size [0.1,0.2,0.3,0.4,0.5]
ii. Output Columns Clustering Technique 2 - Cosine Similarity with Varying Test Size [0.1,0.2,0.3,0.4,0.5]
b. ANN Model on both types of Output Clubbing Techniques and Varying Hidden Layers and Neurons
i. Output Columns Clustering Technique 1 - Manually with varying neurons and layers of the hidden layer: hidden_layers=[(50),(50,50),(25,25),(50,25),(50,50,50),(50,50,25),(50,25,2 5),(50,50,25,25),(50,50,50,25),(50,50,50,50)]
c. ANN Model on Feature Selection Data on both thee techniques: