- Classify and Cluster Given Documents of BBC Sport using a Knn and K-means.
The Notebook Contains a KNN and K-means Implementation.
- Check Notebook : IR-A3-K173654.ipynb
Approach: Supervised
A k-nearest-neighbor is a data classification algorithm that attempts to determine what group a data point is in by looking at the data points around it.
An algorithm, looking at one point on a grid, trying to determine if a point is in group A or B, looks at the states of the points that are near it. The range is arbitrarily determined, but the point is to take a sample of the data. If the majority of the points are in group A, then it is likely that the data point in question will be A rather than B, and vice versa.
Approach: Unsupervised
K-means clustering is a method used for clustering analysis, especially in data mining and statistics. It aims to partition a set of observations into a number of clusters (k), resulting in the partitioning of the data into Voronoi cells. It can be considered a method of finding out which group a certain object really belongs to.
A GUI Based Executable is present that provides a Desktop Interface to the Django based web server which holds the implementation of KNN and Kmeans hosted on Heroku. It can classify and cluster documents and return results.
-
Check Zip File For Desktop GUI Code : IR-SYS-A3
- Repo Link: Click Here
-
Check Folder For Desktop GUI Executable : IR-APP-A3
-
Check Folder For Python Django Web Server : IR-SERVER-A3
- Repo Link: Click Here
A Web Application is hosted at https://ira3.netlify.com/#/ .
- Check Folder for Web Interface Code : IR-WEB-A3
- Repo Link : Click Here
http://mlg.ucd.ie/datasets/bbc.html