Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features/340 gaussian nb #474

Merged
merged 232 commits into from
Mar 3, 2020
Merged

Features/340 gaussian nb #474

merged 232 commits into from
Mar 3, 2020

Conversation

ClaudiaComito
Copy link
Contributor

@ClaudiaComito ClaudiaComito commented Feb 5, 2020

Description

Initiating submodule naive_bayes and class GaussianNB().

Implementation of Gaussian Naive Bayes classification along the scikit-learn lines. Known issues listed below.

Issue/s resolved: #340

Changes proposed:

  • New submodule naive_bayes (on the same level as cluster and regression)
  • New classifier naive_bayes.GaussianNB()
  • Implemented functions GaussianNB().fit(), GaussianNB().predict(), GaussianNB().predict_proba() plus helper functions
  • Implemented test_gaussiannb() for local and distributed mode. In distributed mode, only the case where split=0 is tested for the time being (labels are 1D). Different split configurations require resplit to work from 1 to 0 (cf. Resplit from 1 -> 0 gives wrong order #425).

Known issues:

Type of change

  • New feature (non-breaking change which adds functionality)
  • Documentation update

Due Diligence

  • All split configurations tested- NOT YET: Different split configurations require resplit to work from 1 to 0 (cf. Resplit from 1 -> 0 gives wrong order #425).
  • Multiple dtypes tested in relavent functions
  • Documentation updated (if needed)
  • Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

no

Copy link
Member

@coquelin77 coquelin77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i made a few comments here. nothing crazy though. other than that it looks good to go

raise NotImplementedError(
"weights.split does not match data.split: not implemented yet."
)
# fix after Issue #425 is solved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too many tabs and 1 line below this it looks like there is dead code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too many tabs (black's doing) or too many nested conditional statements?

if former:
🤷‍♀️
else:
I'll think about it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was more confused about the structure here. but after sleeping on it i see what you meant. i read it now with the meaning of after issue #425 is solved then the if statement above these lines and the raise within will both be removed. i misunderstood it before i think

@@ -1227,7 +1230,7 @@ def reduce_vars_elementwise(output_shape_i):

var_shape = list(var.shape) if list(var.shape) else [1]

var_tot = factories.zeros(([x.comm.size, 2] + var_shape), device=x.device)
var_tot = factories.zeros(([x.comm.size, 2] + var_shape), dtype=x.dtype, device=x.device)
n_tot = factories.zeros(x.comm.size, device=x.device)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible dtype problem here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is var() supposed to return a float32? Even if x is float64?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should not. i guess i missed a couple dtype calls here. can you add them for me? it should be just in this spot

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I thought, so is this the possible dtype problem you were talking about? I guess I just misunderstood your first comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i just also realized that the dtype of n_tot doesnt matter because it is only used internally

heat/naive_bayes/gaussianNB.py Outdated Show resolved Hide resolved
heat/naive_bayes/gaussianNB.py Show resolved Hide resolved
@coquelin77
Copy link
Member

good work!

@coquelin77 coquelin77 merged commit 8b438d2 into master Mar 3, 2020
@coquelin77 coquelin77 deleted the features/340-GaussianNB branch March 3, 2020 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Gaussian Naive Bayes algorithm
3 participants