Firstly, Data Preprocessing has been done which includes-
-
Load the dataset as dataframe using pandas
-
Handle missing values if needed
-
Encode categorical features if needed
-
Scale all the values between 0-1 with proper scaling technique
-
Perform Classification. Split the dataset into features and labels. (The last column which is 'income_>50K' has been considered as the label or class)
-
Perform classification and calculate accuracy using logistic regression. Perform necessary pre-processing on the dataset before classification. Use 8:2 train-test split.
-
Perform classification and calculate accuracy using decision tree. Perform necessary pre-processing on the dataset before classification. Use 8:2 train-test split.
-
Compare the accuracy and plot them as a bar chart using matplotlib/seaborn.
Comparing Accuracy of Support Vector Machine (SVM), Neural Network (Multilayer Perceptron Classifier) and Random Forest machine learning classifiers pre-PCA and post-PCA
-
Apply necessary pre-processing steps on it
-
Divide the dataset into 8:2 train-test split and perform Support Vector Machine, Neural Network (MLPClassifier) and Random Forest on it.
-
Perform dimensionality reduction using PCA. Reduce the number of feature vectors into half
-
Apply Support Vector Machine, Neural Network (MLPClassifier) and Random Forest again on the reduced dataset.
-
Compare the accuracy of the pre-PCA and post-PCA results. Plot them against each other in a bar graph.