-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement vector leaf prediction for fil. #3917
Implement vector leaf prediction for fil. #3917
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
The only piece that needs significant changes is making sure that summation of vector leaves is deterministic given a GPU and thread block size, which means no floating-point atomics.
Otherwise, mostly small technical comments.
@@ -63,16 +63,19 @@ struct dense_tree { | |||
/** dense_storage stores the forest as a collection of dense nodes */ | |||
struct dense_storage { | |||
__host__ __device__ dense_storage(dense_node* nodes, int num_trees, | |||
int tree_stride, int node_pitch) | |||
int tree_stride, int node_pitch, | |||
float* vector_leaf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given how much fields dense_storage
and sparse_storage
share, consider adding base_storage
as the base type for both containing the common fields and implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it would also help categorical features in the future
cpp/src/fil/fil.cu
Outdated
|
||
The multi-class classification / regression (VECTOR_LEAF) predict() works as follows | ||
(always 1 output): | ||
RAW (no values set): output the label of the class with highest probability, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@levsnv Isn't it supposed to have CLASS
set to output class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@canonizer no, we decided that predict() vs predict_proba() is the defining semantic. CLASS really matters in FLOAT_UNARY_BINARY, otherwise it's ignored both ways.
The implementation is deterministic now. In order to transpose items in shared memory inside the accumulate function, I need to call syncthreads, and so the implementation is modified allowing all threads in the block enter the accumulate function. Each implementation of accumulate should take care of any necessary synchronisation and deal with dummy rows appropriately. |
cpp/src/fil/fil.cu
Outdated
|
||
The multi-class classification / regression (VECTOR_LEAF) predict() works as follows | ||
(always 1 output): | ||
RAW (no values set): output the label of the class with highest probability, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@canonizer no, we decided that predict() vs predict_proba() is the defining semantic. CLASS really matters in FLOAT_UNARY_BINARY, otherwise it's ignored both ways.
i += blockDim.x) { | ||
int c = i % num_classes; | ||
for (int j = threadIdx.x / num_classes; j < blockDim.x; | ||
j += num_threads_per_class) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused, what does j
represent here? Wouldn't it be clearer if we had two cases separately, when num_threads_per_class > 1 and when not, so that there are fewer nested loops which may or may not have 1 iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separated with an if-else, that is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel free to resolve this
cpp/src/fil/infer.cu
Outdated
__syncthreads(); | ||
for (int c = threadIdx.x; c < num_classes; c += blockDim.x) { | ||
#pragma unroll | ||
for (int row = 0; row < num_rows; ++row) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider having a specialization for full blocks (num_rows == NITEMS
) and partial blocks (num_rows < NITEMS
).
Of course, it's fine to do in the optimization pull request.
I still need to update the finalise method |
…fil-vector-leaf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, provided that the comments (especially those about code deduplication and more tests) are addressed.
Codecov Report
@@ Coverage Diff @@
## branch-21.08 #3917 +/- ##
===============================================
Coverage ? 85.32%
===============================================
Files ? 230
Lines ? 18095
Branches ? 0
===============================================
Hits ? 15439
Misses ? 2656
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
i += blockDim.x) { | ||
int c = i % num_classes; | ||
for (int j = threadIdx.x / num_classes; j < blockDim.x; | ||
j += num_threads_per_class) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel free to resolve this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codeowner review
@gpucibot merge |
Implement vector leaf prediction for fil. - Adds `leaf_algo_t::VECTOR_LEAF` to internal C++ code. - Adds an optionally used vector to fil tree representations, pass this into prediction kernels and accumulators - Add accumulator for vector leaf using atomics - Unit tests - Enable python tests with sklearn random forest for multiclass classification (k>2) and multi-output regression. @levsnv @canonizer Authors: - Rory Mitchell (https://github.com/RAMitchell) Approvers: - Andy Adinets (https://github.com/canonizer) - https://github.com/levsnv - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#3917
Implement vector leaf prediction for fil.
leaf_algo_t::VECTOR_LEAF
to internal C++ code.@levsnv @canonizer