Implement vector leaf prediction for fil. #3917

RAMitchell · 2021-05-31T22:59:12Z

Implement vector leaf prediction for fil.

Adds leaf_algo_t::VECTOR_LEAF to internal C++ code.
Adds an optionally used vector to fil tree representations, pass this into prediction kernels and accumulators
Add accumulator for vector leaf using atomics
Unit tests
Enable python tests with sklearn random forest for multiclass classification (k>2) and multi-output regression.

@levsnv @canonizer

…fil-vector-leaf

levsnv

Thank you!

cpp/src/fil/fil.cu

cpp/src/fil/infer.cu

cpp/test/sg/fil_test.cu

python/cuml/test/test_fil.py

cpp/src/fil/infer.cu

cpp/test/sg/fil_test.cu

cpp/src/fil/infer.cu

cpp/src/fil/fil.cu

cpp/src/fil/common.cuh

cpp/src/fil/fil.cu

…fil-vector-leaf

canonizer

Nice work!

The only piece that needs significant changes is making sure that summation of vector leaves is deterministic given a GPU and thread block size, which means no floating-point atomics.

Otherwise, mostly small technical comments.

cpp/src/fil/common.cuh

canonizer · 2021-06-08T01:56:04Z

cpp/src/fil/common.cuh

@@ -63,16 +63,19 @@ struct dense_tree {
 /** dense_storage stores the forest as a collection of dense nodes */
 struct dense_storage {
  __host__ __device__ dense_storage(dense_node* nodes, int num_trees,
-                                    int tree_stride, int node_pitch)
+                                    int tree_stride, int node_pitch,
+                                    float* vector_leaf)


Given how much fields dense_storage and sparse_storage share, consider adding base_storage as the base type for both containing the common fields and implementation.

yes, it would also help categorical features in the future

canonizer · 2021-06-08T02:01:57Z

cpp/src/fil/fil.cu

+
+    The multi-class classification / regression (VECTOR_LEAF) predict() works as follows
+      (always 1 output):
+    RAW (no values set): output the label of the class with highest probability,


@levsnv Isn't it supposed to have CLASS set to output class?

@canonizer no, we decided that predict() vs predict_proba() is the defining semantic. CLASS really matters in FLOAT_UNARY_BINARY, otherwise it's ignored both ways.

cpp/src/fil/fil.cu

cpp/src/fil/infer.cu

cpp/test/sg/fil_test.cu

RAMitchell · 2021-06-09T05:29:52Z

The implementation is deterministic now. In order to transpose items in shared memory inside the accumulate function, I need to call syncthreads, and so the implementation is modified allowing all threads in the block enter the accumulate function. Each implementation of accumulate should take care of any necessary synchronisation and deal with dummy rows appropriately.

…fil-vector-leaf

…-vector-leaf

levsnv · 2021-06-09T07:55:06Z

cpp/src/fil/fil.cu

+
+    The multi-class classification / regression (VECTOR_LEAF) predict() works as follows
+      (always 1 output):
+    RAW (no values set): output the label of the class with highest probability,


@canonizer no, we decided that predict() vs predict_proba() is the defining semantic. CLASS really matters in FLOAT_UNARY_BINARY, otherwise it's ignored both ways.

cpp/src/fil/infer.cu

levsnv · 2021-06-09T08:28:21Z

cpp/src/fil/infer.cu

+         i += blockDim.x) {
+      int c = i % num_classes;
+      for (int j = threadIdx.x / num_classes; j < blockDim.x;
+           j += num_threads_per_class) {


I am confused, what does j represent here? Wouldn't it be clearer if we had two cases separately, when num_threads_per_class > 1 and when not, so that there are fewer nested loops which may or may not have 1 iteration?

separated with an if-else, that is

feel free to resolve this

cpp/src/fil/infer.cu

cpp/src/fil/fil.cu

cpp/src/fil/infer.cu

canonizer · 2021-06-10T23:56:31Z

cpp/src/fil/infer.cu

+    __syncthreads();
+    for (int c = threadIdx.x; c < num_classes; c += blockDim.x) {
+#pragma unroll
+      for (int row = 0; row < num_rows; ++row) {


Consider having a specialization for full blocks (num_rows == NITEMS) and partial blocks (num_rows < NITEMS).

Of course, it's fine to do in the optimization pull request.

cpp/src/fil/infer.cu

RAMitchell · 2021-06-11T04:16:27Z

I still need to update the finalise method
as suggested by Levs. Will get to it next week.

…fil-vector-leaf

cpp/src/fil/infer.cu

canonizer

Approved, provided that the comments (especially those about code deduplication and more tests) are addressed.

cpp/src/fil/fil.cu

cpp/test/sg/fil_test.cu

cpp/src/fil/infer.cu

cpp/test/sg/fil_test.cu

codecov-commenter · 2021-06-16T02:53:32Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.08@c35680f). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-21.08    #3917   +/-   ##
===============================================
  Coverage                ?   85.32%           
===============================================
  Files                   ?      230           
  Lines                   ?    18095           
  Branches                ?        0           
===============================================
  Hits                    ?    15439           
  Misses                  ?     2656           
  Partials                ?        0

Flag	Coverage Δ
dask	`47.90% <0.00%> (?)`
non-dask	`77.67% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c35680f...be6a283. Read the comment docs.

levsnv · 2021-06-16T00:44:08Z

cpp/src/fil/infer.cu

+         i += blockDim.x) {
+      int c = i % num_classes;
+      for (int j = threadIdx.x / num_classes; j < blockDim.x;
+           j += num_threads_per_class) {


feel free to resolve this

dantegd

codeowner review

dantegd · 2021-06-17T22:33:49Z

@gpucibot merge

@levsnv

Implement vector leaf prediction for fil. - Adds `leaf_algo_t::VECTOR_LEAF` to internal C++ code. - Adds an optionally used vector to fil tree representations, pass this into prediction kernels and accumulators - Add accumulator for vector leaf using atomics - Unit tests - Enable python tests with sklearn random forest for multiclass classification (k>2) and multi-output regression. @levsnv @canonizer Authors: - Rory Mitchell (https://github.com/RAMitchell) Approvers: - Andy Adinets (https://github.com/canonizer) - https://github.com/levsnv - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#3917

RAMitchell added 3 commits May 26, 2021 15:14

Vector leaf

c8b6e34

Tests passing

ba1b15e

Merge branch 'branch-21.08' of https://github.com/rapidsai/cuml into …

38e1239

…fil-vector-leaf

RAMitchell requested review from a team as code owners May 31, 2021 22:59

github-actions bot added CUDA/C++ Cython / Python Cython or Python issue labels May 31, 2021

dantegd added the 3 - Ready for Review Ready for review by team label Jun 1, 2021

levsnv reviewed Jun 1, 2021

View reviewed changes

levsnv reviewed Jun 2, 2021

View reviewed changes

cpp/src/fil/infer.cu Outdated Show resolved Hide resolved

cpp/src/fil/infer.cu Outdated Show resolved Hide resolved

cpp/src/fil/fil.cu Show resolved Hide resolved

cpp/src/fil/common.cuh Outdated Show resolved Hide resolved

levsnv reviewed Jun 2, 2021

View reviewed changes

cpp/src/fil/fil.cu Outdated Show resolved Hide resolved

RAMitchell added 3 commits June 1, 2021 21:32

Merge branch 'branch-21.08' of https://github.com/rapidsai/cuml into …

3f260e5

…fil-vector-leaf

Merge branch 'branch-21.08' of https://github.com/rapidsai/cuml into …

f089eb6

…fil-vector-leaf

Partially address review comments

de5f33d

canonizer suggested changes Jun 8, 2021

View reviewed changes

Address review comments

d865918

RAMitchell added 3 commits June 9, 2021 14:37

Address review comments

3ec4b65

Merge branch 'branch-21.08' of https://github.com/rapidsai/cuml into …

fdbead3

…fil-vector-leaf

Merge branch 'fil-vector-leaf' of github.com:RAMitchell/cuml into fil…

4ce63c9

…-vector-leaf

levsnv reviewed Jun 9, 2021

View reviewed changes

canonizer suggested changes Jun 11, 2021

View reviewed changes

Some review comments

393de19

RAMitchell added 2 commits June 13, 2021 16:07

Efficient finalise step

cb49eca

Remove redundant methods

5707be2

dantegd added 4 - Waiting on Author Waiting for author to respond to review and removed 3 - Ready for Review Ready for review by team labels Jun 14, 2021

Merge branch 'branch-21.08' of https://github.com/rapidsai/cuml into …

e47e93c

…fil-vector-leaf

levsnv reviewed Jun 15, 2021

View reviewed changes

cpp/src/fil/infer.cu Show resolved Hide resolved

cpp/src/fil/infer.cu Show resolved Hide resolved

cpp/src/fil/infer.cu Outdated Show resolved Hide resolved

cpp/src/fil/infer.cu Show resolved Hide resolved

cpp/src/fil/infer.cu Outdated Show resolved Hide resolved

Review comments

b201f80

canonizer approved these changes Jun 15, 2021

View reviewed changes

RAMitchell added 2 commits June 15, 2021 16:36

Review comments

83e882b

Save memory in vector_leaf

be6a283

levsnv approved these changes Jun 17, 2021

View reviewed changes

levsnv added 3 - Ready for Review Ready for review by team and removed 4 - Waiting on Author Waiting for author to respond to review labels Jun 17, 2021

dantegd changed the title ~~Fil vector leaf~~ Implement vector leaf prediction for fil. Jun 17, 2021

dantegd approved these changes Jun 17, 2021

View reviewed changes

dantegd added feature request New feature or request non-breaking Non-breaking change labels Jun 17, 2021

rapids-bot bot merged commit 7c62beb into rapidsai:branch-21.08 Jun 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement vector leaf prediction for fil. #3917

Implement vector leaf prediction for fil. #3917

RAMitchell commented May 31, 2021 •

edited

Loading

levsnv left a comment

canonizer left a comment

canonizer Jun 8, 2021

levsnv Jun 8, 2021

canonizer Jun 8, 2021

levsnv Jun 9, 2021

RAMitchell commented Jun 9, 2021 •

edited

Loading

levsnv Jun 9, 2021

levsnv Jun 9, 2021

levsnv Jun 9, 2021

levsnv Jun 16, 2021

canonizer Jun 10, 2021

RAMitchell commented Jun 11, 2021

canonizer left a comment

codecov-commenter commented Jun 16, 2021

levsnv Jun 16, 2021

dantegd left a comment

dantegd commented Jun 17, 2021

Implement vector leaf prediction for fil. #3917

Implement vector leaf prediction for fil. #3917

Conversation

RAMitchell commented May 31, 2021 • edited Loading

levsnv left a comment

Choose a reason for hiding this comment

canonizer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RAMitchell commented Jun 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RAMitchell commented Jun 11, 2021

canonizer left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jun 16, 2021

Codecov Report

Choose a reason for hiding this comment

dantegd left a comment

Choose a reason for hiding this comment

dantegd commented Jun 17, 2021

RAMitchell commented May 31, 2021 •

edited

Loading

RAMitchell commented Jun 9, 2021 •

edited

Loading