AUT-Datamining-projects

Projects and pratical assignments for data mining course at AUT, spring 2022

Projects:

Preprocessing iris flower dataset
Neural network
- Circles classification
- Classification fashion mnist dataset
Clustring
Assosiation rules
Final project

Preprocessing iris flower dataset:

Various pre-processings were done in this project.

Missing values were recognized and dropped
One-hot encoding have been used for encoding categorical features
Numerical features were normalized using StandardScaler
Principal Component Analysis (PCA) have been used for dimensionality reduction (4D to 2D)
Reduced features have been visualized
Original Dataset(without NaN-values) have been visualized using Box plot

Results:

PCA output

Features boxplot

Circles classification:

The final goal of this project was to make and tune a suitable ANN for classifying 2D-data in form of circles. Different steps were done in order to grasp a better understanding of ANNs. In each step, model accuracy and loss were plotted for both test and train datasets.

Results:

Accuracy and loss

Decision boundary

Classification fashion MNIST dataset:

Simple ANN on Fashion MNIST dataset using tensorflow.

Results:

Train accuracy: 0.8821
Test accuracy: 0.8820
Confusion matrix

Clustring:

Using k-means and DBSCAN for clustring given datasets

K-means

K-means have been used to cluster a given dataset.
The elbow method have been used to obtain the optimal value of K in k-means algorithm
K-means have been used on digits dataset, at first dimensionality reduction was done using Isomap then k-means was performed on redusced dataset.

DBSCAN

KNN have been used to determine the optimal value for epsilon in DBSCAN algorithm
Different values for MinPts and epsilon have been tested in order to find the best hyperparameters for DBSCAN on the dataset

Results: k-means performs well when the data is linearly separable. But in other cases, due to the linearity of k-means clustering, the result is not desirable. The DBSCAN algorithm is a density-based clustering algorithm that performs better than k-means when our dataset is not linearly separable.

Assosiation rules:

In this assignment, association rules have been extracted from a given dataset using apriori algorithm.

Final project

The general purpose of this project was to implement a classifier which finds symptoms of diabetes or pre-diabetes for the given patients information based on a CDC dataset. XGBoost was used to implement classification model.

Preprocessing
- Null values / meaningless values have been removed
- Numerical features were normalized
- Categorical features have been changed to one-hot-encoding
- train/test dataset have been created
Model creation
- classification model was defined using XGBClassifier
Model evaluation
- Accuracy, persicion and recal have been calculated for train and test datasets
- ROC-AUC score has been calculated
- Confusion matrix has been plotted
Hyperparameter tuning
- Best hyperparameters for our XGBClassifier have been found using GridSearchCV
- Hyperparameter changes have been plotted

Results:

Best hyperparameters: {'colsample_bytree': 0.8, 'learning_rate': 0.05, 'max_depth': 3, 'n_estimators': 300}
Test Accuracy: 75.70%
Train Accuracy: 76.13%
Test precision: 0.759
Train precision: 0.763
Test recall: 0.757
Train recall: 0.761
ROC score: 0.840

Mean score and standard deviation for each hyperparameter:

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Association_Rules		Association_Rules
Clustering		Clustering
Final_Project		Final_Project
Neural_Network		Neural_Network
Preprocess_IRIS_dataset		Preprocess_IRIS_dataset
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AUT-Datamining-projects

Projects:

Preprocessing iris flower dataset:

Results:

Circles classification:

Results:

Classification fashion MNIST dataset:

Results:

Clustring:

K-means

DBSCAN

Assosiation rules:

Final project

Results:

About

Releases

Packages

Languages

hedzd/AUT-Datamining-projects

Folders and files

Latest commit

History

Repository files navigation

AUT-Datamining-projects

Projects:

Preprocessing iris flower dataset:

Results:

Circles classification:

Results:

Classification fashion MNIST dataset:

Results:

Clustring:

K-means

DBSCAN

Assosiation rules:

Final project

Results:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages