-
-
Notifications
You must be signed in to change notification settings - Fork 46.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DBSCAN algorithm #1207
DBSCAN algorithm #1207
Conversation
storytelling and a .py file for people that just want to look at the code. The code in both is essentially the same. With a few things different in the .py file for plotting the clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great contribution... A few minor points.
Think long and hard about function names. They really help others to understand and reuse your algorithm.
Please add a doctest or two to each function in machine_learning/dbscan/dbscan.py so that our automated testing verifies them. https://docs.python.org/3/library/doctest.html
See https://pep8.org/#function-names In Python function_names should be lower_case and ClassNames should be CamelCase.
machine_learning/dbscan/dbscan.ipynb
Outdated
"Sadly, we have some unavoidable requirements.\n", | ||
"1. Matplotlib for visualization\n", | ||
"2. Scikit-learn for grabbing some standard datasets to test on\n", | ||
"3. Numpy\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good news. https://github.com/TheAlgorithms/Python/blob/master/requirements.txt already loads all thos dependencies (and many more).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. I should have noticed that.
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please first explain what DBSCAN is, when I would use it, and why it is supercool. Don't assume that your reader know already and try to get them excited to read more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On it.
machine_learning/dbscan/dbscan.ipynb
Outdated
"metadata": {}, | ||
"source": [ | ||
"The implementation is inspired from the original DBScan algorithm as given in \n", | ||
"<a href = \"https://en.wikipedia.org/wiki/DBSCAN\">DBSCAN Wikipedia</a>\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make that link work when the file is viewed on GitHub?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
machine_learning/dbscan/dbscan.ipynb
Outdated
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"def distFunc(Q, P):\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go with euclidean_distance() instead of distFunc().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
machine_learning/dbscan/dbscan.ipynb
Outdated
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"def rangeQuery(DB,Q,eps):\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
find_neighbors() instead of rangeQuery() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
machine_learning/dbscan/dbscan.ipynb
Outdated
" for P in DB:\n", | ||
" if distFunc(Q,P) <= eps:\n", | ||
" Neighbors.append(P)\n", | ||
" return Neighbors\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole function can be a one-liner using a list comprehension_:
return [p for for p in DB if find_neighbors(Q, P) <= eps]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Still need to do docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work. The doctests are passing:
https://travis-ci.org/TheAlgorithms/Python/builds/591043759#L647
Thanks for this... On future work, consider function, class, and variable names for readability. Single letter variable names ( |
Thank you for letting me contribute. Your reviews have made me grow as a
programmer. I will definitely heed all your suggestions. Every next piece
of code I write will meet all these standards.
…On Sun, Sep 29, 2019, 14:22 Christian Clauss ***@***.***> wrote:
Thanks for this... On future work, consider function, class, and variable
names for readability. Single letter variable names (p) are old school.
Is it a point or a person or a particle? Use the full name. For me, db is
a database and eps is earnings per share. Don't make your reader guess.
Remember that your reader may be you in 6 months or 18 month so be kind.
Really though, great contribution. Congratulations.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1207?email_source=notifications&email_token=ACXBUNM4CES7Q6OYJ45UAWDQMBUGHA5CNFSM4I3M6K6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD73NS6Y#issuecomment-536271227>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACXBUNO2WCXFSYD66IFANCTQMBUGHANCNFSM4I3M6K6A>
.
|
* Added dbscan in two formats. A jupyter notebook file for the storytelling and a .py file for people that just want to look at the code. The code in both is essentially the same. With a few things different in the .py file for plotting the clusters. * fixed LGTM problems * Some requested changes implemented. Still need to do docstring * implememted all changes as requested
Adding the dbscan algorithm in two formats, a jupyter notebook file for the storytelling and a .py file for people that just want to look at the code. The code in both is essentially the same. With a few things different in the .py file for plotting the clusters.