DBSCAN algorithm #1207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

cclauss merged 4 commits into TheAlgorithms:master from cozek:dbscan-implementation

Sep 29, 2019

+647 −0

Member

cozek commented Sep 28, 2019

Adding the dbscan algorithm in two formats, a jupyter notebook file for the storytelling and a .py file for people that just want to look at the code. The code in both is essentially the same. With a few things different in the .py file for plotting the clusters.

cozek added 2 commits

September 28, 2019 11:29


          Added dbscan in two formats. A jupyter notebook file for the

12a1a55

storytelling and a .py file for people that just want to look at the
code. The code in both is essentially the same. With a few things
different in the .py file for plotting the clusters.


          fixed LGTM problems

55d7214

cclauss requested changes

View reviewed changes

Member

cclauss left a comment •

edited

Loading

Great contribution... A few minor points.

Think long and hard about function names. They really help others to understand and reuse your algorithm.

Please add a doctest or two to each function in machine_learning/dbscan/dbscan.py so that our automated testing verifies them. https://docs.python.org/3/library/doctest.html

See https://pep8.org/#function-names In Python function_names should be lower_case and ClassNames should be CamelCase.

machine_learning/dbscan/dbscan.ipynb Outdated

+                  "Sadly, we have some unavoidable requirements.\n",
+                  "1. Matplotlib for visualization\n",
+                  "2. Scikit-learn for grabbing some standard datasets to test on\n",
+                  "3. Numpy\n",

Member

cclauss Sep 28, 2019

Good news. https://github.com/TheAlgorithms/Python/blob/master/requirements.txt already loads all thos dependencies (and many more).

Member Author

cozek Sep 28, 2019

Alright. I should have noticed that.

machine_learning/dbscan/dbscan.ipynb

+                {
+                 "cell_type": "markdown",
+                 "metadata": {},
+                 "source": [

Member

cclauss Sep 28, 2019

Please first explain what DBSCAN is, when I would use it, and why it is supercool. Don't assume that your reader know already and try to get them excited to read more.

Member Author

cozek Sep 28, 2019

On it.

machine_learning/dbscan/dbscan.ipynb Outdated

+                 "metadata": {},
+                 "source": [
+                  "The implementation is inspired from the original DBScan algorithm as given in \n",
+                  "<a href = \"https://en.wikipedia.org/wiki/DBSCAN\">DBSCAN Wikipedia</a>\n",

Member

cclauss Sep 28, 2019

Can we make that link work when the file is viewed on GitHub?

Member Author

cozek Sep 28, 2019

Done.

machine_learning/dbscan/dbscan.ipynb Outdated

+                 "metadata": {},
+                 "outputs": [],
+                 "source": [
+                  "def distFunc(Q, P):\n",

Member

cclauss Sep 28, 2019

Let's go with euclidean_distance() instead of distFunc().

Member Author

cozek Sep 28, 2019

Done.

machine_learning/dbscan/dbscan.ipynb Outdated

+                 "metadata": {},
+                 "outputs": [],
+                 "source": [
+                  "def rangeQuery(DB,Q,eps):\n",

Member

cclauss Sep 28, 2019

find_neighbors() instead of rangeQuery() ?

Member Author

cozek Sep 28, 2019

Done.

machine_learning/dbscan/dbscan.ipynb Outdated

+                  "    for P in DB:\n",
+                  "        if distFunc(Q,P) <= eps:\n",
+                  "            Neighbors.append(P)\n",
+                  "    return Neighbors\n"

Member

cclauss Sep 28, 2019

This whole function can be a one-liner using a list comprehension_:
return [p for for p in DB if find_neighbors(Q, P) <= eps]

Member Author

cozek Sep 28, 2019

Done.

cozek added 2 commits

September 28, 2019 21:47


          Some requested changes implemented.

b5130d9

Still need to do docstring


          implememted all changes as requested

878ce88

cclauss approved these changes

View reviewed changes

Member

cclauss left a comment

Awesome work. The doctests are passing:
https://travis-ci.org/TheAlgorithms/Python/builds/591043759#L647

cclauss merged commit 4617aa7 into TheAlgorithms:master

cozek deleted the dbscan-implementation branch

September 29, 2019 08:46

Member

cclauss commented Sep 29, 2019

Thanks for this... On future work, consider function, class, and variable names for readability. Single letter variable names (p) are old school. Is it a point or a person or a particle? Use the full name. For me, db is a database and eps is earnings per share. Don't make your reader guess. Remember that your reader may be you in 6 months or 18 month so be kind. Really though, great contribution. Congratulations.

Member Author

cozek commented Sep 29, 2019 via email

Thank you for letting me contribute. Your reviews have made me grow as a programmer. I will definitely heed all your suggestions. Every next piece of code I write will meet all these standards.

On Sun, Sep 29, 2019, 14:22 Christian Clauss ***@***.***> wrote: Thanks for this... On future work, consider function, class, and variable names for readability. Single letter variable names (p) are old school. Is it a point or a person or a particle? Use the full name. For me, db is a database and eps is earnings per share. Don't make your reader guess. Remember that your reader may be you in 6 months or 18 month so be kind. Really though, great contribution. Congratulations. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1207?email_source=notifications&email_token=ACXBUNM4CES7Q6OYJ45UAWDQMBUGHA5CNFSM4I3M6K6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD73NS6Y#issuecomment-536271227>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACXBUNO2WCXFSYD66IFANCTQMBUGHANCNFSM4I3M6K6A> .

stokhos pushed a commit to stokhos/Python that referenced this pull request


          DBSCAN algorithm (TheAlgorithms#1207)

d77601f

* Added dbscan in two formats. A jupyter notebook file for the
storytelling and a .py file for people that just want to look at the
code. The code in both is essentially the same. With a few things
different in the .py file for plotting the clusters.

* fixed LGTM problems

* Some requested changes implemented.
Still need to do docstring

* implememted all changes as requested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet