Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Multinomial and Bernoulli Naive Bayes variants #4053

Merged
merged 26 commits into from
Jul 22, 2021

Conversation

lowener
Copy link
Contributor

@lowener lowener commented Jul 13, 2021

This is a continuation of PR #1763, to add Multinomial and Bernoulli NB variants.
The Gaussian and Categorical variants will be added in a following PR.

Also linking issue #1666

@lowener lowener requested a review from a team as a code owner July 13, 2021 21:38
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Jul 13, 2021
@lowener lowener added the non-breaking Non-breaking change label Jul 14, 2021
@dantegd dantegd added the feature request New feature or request label Jul 14, 2021
@lowener lowener requested a review from cjnolet July 14, 2021 20:51
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks very nice. Found only minor things as usual.

from cuml.common.kernel_utils import cuda_kernel_factory


def _binarize_kernel(x_dtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something that has to be fixed in this PR since it's being used everywhere already, but CuPy now supports writing kernels with template arguments so we should be able to remove the use of the cuda_kernel_factory everywhere in the codebase. It should also make our kernel invocations look much more clean.

@lowener lowener requested a review from cjnolet July 19, 2021 17:32
@lowener
Copy link
Contributor Author

lowener commented Jul 20, 2021

Here are comparison of cuML and SKLearn performance on Multinomial and Bernoulli NB.
This is done using a synthetic dataset generated by make_regression.
The GPU used is a RTX 8000, and the CPU is i9-10920X @ 3.50GHz

Multinomial
multinomial_speedup

Bernoulli
BernoulliNB

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending successful CI. It looks like there are a couple failures in the pickling tests. Thanks Mickael!

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.08@c9abba1). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.08    #4053   +/-   ##
===============================================
  Coverage                ?   85.77%           
===============================================
  Files                   ?      231           
  Lines                   ?    18261           
  Branches                ?        0           
===============================================
  Hits                    ?    15664           
  Misses                  ?     2597           
  Partials                ?        0           
Flag Coverage Δ
dask 48.19% <0.00%> (?)
non-dask 78.24% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c9abba1...e54b783. Read the comment docs.

@dantegd
Copy link
Member

dantegd commented Jul 22, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 2e063cb into rapidsai:branch-21.08 Jul 22, 2021
@lowener lowener deleted the 21.08-multinomial-nb branch July 23, 2021 00:23
rapids-bot bot pushed a commit that referenced this pull request Aug 9, 2021
This is a continuation of PR #1763 and #4053, to add Gaussian Naive Bayes.
This is supposed to be merged after #4053 

Here is a comparison of cuML and SKLearn performance on Gaussian NB.
This is done using a synthetic dataset generated by make_regression.
The GPU used is a RTX 8000, and the CPU is i9-10920X @ 3.50GHz
![gaussian](https://user-images.githubusercontent.com/9810050/126572439-8982faa8-5ad1-4bca-91ab-76704050bf33.png)

Linking issue #1666

Authors:
  - Micka (https://github.com/lowener)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #4079
rapids-bot bot pushed a commit that referenced this pull request Sep 8, 2021
This is a continuation of PR #1763, #4053, and #4079, to add Categorical Naive Bayes.
This is supposed to be merged after #4079.
Linking issue #1666.

Authors:
  - Micka (https://github.com/lowener)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #4150
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
This is a continuation of PR rapidsai#1763, to add Multinomial and Bernoulli NB variants.
The Gaussian and Categorical variants will be added in a following PR.

Also linking issue rapidsai#1666

Authors:
  - Micka (https://github.com/lowener)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#4053
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
This is a continuation of PR rapidsai#1763 and rapidsai#4053, to add Gaussian Naive Bayes.
This is supposed to be merged after rapidsai#4053 

Here is a comparison of cuML and SKLearn performance on Gaussian NB.
This is done using a synthetic dataset generated by make_regression.
The GPU used is a RTX 8000, and the CPU is i9-10920X @ 3.50GHz
![gaussian](https://user-images.githubusercontent.com/9810050/126572439-8982faa8-5ad1-4bca-91ab-76704050bf33.png)

Linking issue rapidsai#1666

Authors:
  - Micka (https://github.com/lowener)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#4079
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
This is a continuation of PR rapidsai#1763, rapidsai#4053, and rapidsai#4079, to add Categorical Naive Bayes.
This is supposed to be merged after rapidsai#4079.
Linking issue rapidsai#1666.

Authors:
  - Micka (https://github.com/lowener)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#4150
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cython / Python Cython or Python issue feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants