Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement vector leaf prediction for fil. #3917

Merged
merged 17 commits into from
Jun 17, 2021

Conversation

RAMitchell
Copy link
Contributor

@RAMitchell RAMitchell commented May 31, 2021

Implement vector leaf prediction for fil.

  • Adds leaf_algo_t::VECTOR_LEAF to internal C++ code.
  • Adds an optionally used vector to fil tree representations, pass this into prediction kernels and accumulators
  • Add accumulator for vector leaf using atomics
  • Unit tests
  • Enable python tests with sklearn random forest for multiclass classification (k>2) and multi-output regression.

@levsnv @canonizer

@RAMitchell RAMitchell requested review from a team as code owners May 31, 2021 22:59
@github-actions github-actions bot added CUDA/C++ Cython / Python Cython or Python issue labels May 31, 2021
@dantegd dantegd added the 3 - Ready for Review Ready for review by team label Jun 1, 2021
Copy link
Contributor

@levsnv levsnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link
Contributor

@canonizer canonizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

The only piece that needs significant changes is making sure that summation of vector leaves is deterministic given a GPU and thread block size, which means no floating-point atomics.

Otherwise, mostly small technical comments.

@@ -63,16 +63,19 @@ struct dense_tree {
/** dense_storage stores the forest as a collection of dense nodes */
struct dense_storage {
__host__ __device__ dense_storage(dense_node* nodes, int num_trees,
int tree_stride, int node_pitch)
int tree_stride, int node_pitch,
float* vector_leaf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how much fields dense_storage and sparse_storage share, consider adding base_storage as the base type for both containing the common fields and implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it would also help categorical features in the future


The multi-class classification / regression (VECTOR_LEAF) predict() works as follows
(always 1 output):
RAW (no values set): output the label of the class with highest probability,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@levsnv Isn't it supposed to have CLASS set to output class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@canonizer no, we decided that predict() vs predict_proba() is the defining semantic. CLASS really matters in FLOAT_UNARY_BINARY, otherwise it's ignored both ways.

@RAMitchell
Copy link
Contributor Author

RAMitchell commented Jun 9, 2021

The implementation is deterministic now. In order to transpose items in shared memory inside the accumulate function, I need to call syncthreads, and so the implementation is modified allowing all threads in the block enter the accumulate function. Each implementation of accumulate should take care of any necessary synchronisation and deal with dummy rows appropriately.


The multi-class classification / regression (VECTOR_LEAF) predict() works as follows
(always 1 output):
RAW (no values set): output the label of the class with highest probability,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@canonizer no, we decided that predict() vs predict_proba() is the defining semantic. CLASS really matters in FLOAT_UNARY_BINARY, otherwise it's ignored both ways.

i += blockDim.x) {
int c = i % num_classes;
for (int j = threadIdx.x / num_classes; j < blockDim.x;
j += num_threads_per_class) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused, what does j represent here? Wouldn't it be clearer if we had two cases separately, when num_threads_per_class > 1 and when not, so that there are fewer nested loops which may or may not have 1 iteration?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separated with an if-else, that is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to resolve this

__syncthreads();
for (int c = threadIdx.x; c < num_classes; c += blockDim.x) {
#pragma unroll
for (int row = 0; row < num_rows; ++row) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider having a specialization for full blocks (num_rows == NITEMS) and partial blocks (num_rows < NITEMS).

Of course, it's fine to do in the optimization pull request.

@RAMitchell
Copy link
Contributor Author

I still need to update the finalise method
as suggested by Levs. Will get to it next week.

@dantegd dantegd added 4 - Waiting on Author Waiting for author to respond to review and removed 3 - Ready for Review Ready for review by team labels Jun 14, 2021
Copy link
Contributor

@canonizer canonizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, provided that the comments (especially those about code deduplication and more tests) are addressed.

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.08@c35680f). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.08    #3917   +/-   ##
===============================================
  Coverage                ?   85.32%           
===============================================
  Files                   ?      230           
  Lines                   ?    18095           
  Branches                ?        0           
===============================================
  Hits                    ?    15439           
  Misses                  ?     2656           
  Partials                ?        0           
Flag Coverage Δ
dask 47.90% <0.00%> (?)
non-dask 77.67% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c35680f...be6a283. Read the comment docs.

i += blockDim.x) {
int c = i % num_classes;
for (int j = threadIdx.x / num_classes; j < blockDim.x;
j += num_threads_per_class) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to resolve this

@levsnv levsnv added 3 - Ready for Review Ready for review by team and removed 4 - Waiting on Author Waiting for author to respond to review labels Jun 17, 2021
@dantegd dantegd changed the title Fil vector leaf Implement vector leaf prediction for fil. Jun 17, 2021
Copy link
Member

@dantegd dantegd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codeowner review

@dantegd dantegd added feature request New feature or request non-breaking Non-breaking change labels Jun 17, 2021
@dantegd
Copy link
Member

dantegd commented Jun 17, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 7c62beb into rapidsai:branch-21.08 Jun 17, 2021
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
Implement vector leaf prediction for fil. 

- Adds `leaf_algo_t::VECTOR_LEAF` to internal C++ code. 
- Adds an optionally used vector to fil tree representations, pass this into prediction kernels and accumulators
- Add accumulator for vector leaf using atomics
- Unit tests
- Enable python tests with sklearn random forest for multiclass classification (k>2) and multi-output regression.

@levsnv @canonizer

Authors:
  - Rory Mitchell (https://github.com/RAMitchell)

Approvers:
  - Andy Adinets (https://github.com/canonizer)
  - https://github.com/levsnv
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#3917
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CUDA/C++ Cython / Python Cython or Python issue feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants