Clustering and Outlier Detection

This project demonstrates unsupervised clustering and anomaly detection using classical machine learning algorithms.
It applies K-Means, DBSCAN, and Local Outlier Factor (LOF) to identify structure and irregularities in tabular data.
The analysis highlights how preprocessing, scaling, and algorithm choice influence cluster formation and detection of outliers.

Overview

Clustering and anomaly detection are essential tools in exploratory data analysis.
This notebook compares partition-based and density-based methods, showing how each interprets the same dataset:

Algorithm	Type	Description
K-Means	Partition-based	Divides data into compact, spherical clusters. Requires the number of clusters k in advance.
DBSCAN	Density-based	Groups points based on local density, automatically detecting noise and irregular samples.
LOF	Outlier detection	Scores data points by local deviation from their neighbors to identify anomalies.

Methods

Libraries: scikit-learn, pandas, NumPy, matplotlib, seaborn
Preprocessing: scaling with MinMaxScaler and StandardScaler
Evaluation metrics:
- Silhouette Score — cluster separation and cohesion
- Davies–Bouldin Index — compactness and separation measure

Results

The figure below shows clustering results obtained with DBSCAN (eps = 0.15, min_samples = 5).
Two dense clusters are identified, while a few isolated samples are recognized as noise.
This approach adapts to irregular shapes and densities, making it suitable for unsupervised anomaly detection tasks.

Learning Outcomes

Understand differences between partition-based (K-Means) and density-based (DBSCAN) clustering.
Apply Local Outlier Factor (LOF) for neighborhood-based anomaly detection.
Use internal validation metrics to evaluate cluster quality.
Visualize and interpret clustering results in two dimensions.

This repository is part of a broader focus on unsupervised learning and data-driven anomaly detection within industrial analytics.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitattributes		.gitattributes
README.md		README.md
clustering_outlier_detection.ipynb		clustering_outlier_detection.ipynb
iris_clusters.csv		iris_clusters.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Clustering and Outlier Detection

Overview

Methods

Results

Learning Outcomes

About

Uh oh!

Releases

Packages

Languages

goktugcelenk/clustering_outlier_detection

Folders and files

Latest commit

History

Repository files navigation

Clustering and Outlier Detection

Overview

Methods

Results

Learning Outcomes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages