Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBSCAN algorithm #1207

Merged
merged 4 commits into from
Sep 29, 2019
Merged

DBSCAN algorithm #1207

merged 4 commits into from
Sep 29, 2019

Conversation

cozek
Copy link
Member

@cozek cozek commented Sep 28, 2019

Adding the dbscan algorithm in two formats, a jupyter notebook file for the storytelling and a .py file for people that just want to look at the code. The code in both is essentially the same. With a few things different in the .py file for plotting the clusters.

storytelling and a .py file for people that just want to look at the
code. The code in both is essentially the same. With a few things
different in the .py file for plotting the clusters.
Copy link
Member

@cclauss cclauss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great contribution... A few minor points.

Think long and hard about function names. They really help others to understand and reuse your algorithm.

Please add a doctest or two to each function in machine_learning/dbscan/dbscan.py so that our automated testing verifies them. https://docs.python.org/3/library/doctest.html

See https://pep8.org/#function-names In Python function_names should be lower_case and ClassNames should be CamelCase.

"Sadly, we have some unavoidable requirements.\n",
"1. Matplotlib for visualization\n",
"2. Scikit-learn for grabbing some standard datasets to test on\n",
"3. Numpy\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good news. https://github.com/TheAlgorithms/Python/blob/master/requirements.txt already loads all thos dependencies (and many more).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I should have noticed that.

{
"cell_type": "markdown",
"metadata": {},
"source": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please first explain what DBSCAN is, when I would use it, and why it is supercool. Don't assume that your reader know already and try to get them excited to read more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On it.

"metadata": {},
"source": [
"The implementation is inspired from the original DBScan algorithm as given in \n",
"<a href = \"https://en.wikipedia.org/wiki/DBSCAN\">DBSCAN Wikipedia</a>\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make that link work when the file is viewed on GitHub?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"metadata": {},
"outputs": [],
"source": [
"def distFunc(Q, P):\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with euclidean_distance() instead of distFunc().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"metadata": {},
"outputs": [],
"source": [
"def rangeQuery(DB,Q,eps):\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_neighbors() instead of rangeQuery() ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

" for P in DB:\n",
" if distFunc(Q,P) <= eps:\n",
" Neighbors.append(P)\n",
" return Neighbors\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole function can be a one-liner using a list comprehension_:
return [p for for p in DB if find_neighbors(Q, P) <= eps]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@cclauss cclauss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cclauss cclauss merged commit 4617aa7 into TheAlgorithms:master Sep 29, 2019
@cozek cozek deleted the dbscan-implementation branch September 29, 2019 08:46
@cclauss
Copy link
Member

cclauss commented Sep 29, 2019

Thanks for this... On future work, consider function, class, and variable names for readability. Single letter variable names (p) are old school. Is it a point or a person or a particle? Use the full name. For me, db is a database and eps is earnings per share. Don't make your reader guess. Remember that your reader may be you in 6 months or 18 month so be kind. Really though, great contribution. Congratulations.

@cozek
Copy link
Member Author

cozek commented Sep 29, 2019 via email

stokhos pushed a commit to stokhos/Python that referenced this pull request Jan 3, 2021
* Added dbscan in two formats. A jupyter notebook file for the
storytelling and a .py file for people that just want to look at the
code. The code in both is essentially the same. With a few things
different in the .py file for plotting the clusters.

* fixed LGTM problems

* Some requested changes implemented.
Still need to do docstring

* implememted all changes as requested
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants