[TRACKER] Improve performance of the new RF backend

New RF backend can be considerably slower depending on `max_depth` and `n_bins`! Initial profiling shows `computeSplit` kernels are by far the biggest bottlenecks.

## low hanging fruits
new backend exposes the following parameters to be tuned as per the depth and the number of samples available in the current node to be split. They are:
1. [`n_blks_for_cols`](https://github.com/rapidsai/cuml/blob/branch-0.19/cpp/src/decisiontree/batched-levelalgo/builder_base.cuh#L100) - number of columns to be simultaneously processed in a single `computeSplit` kernel call. This is a trade-off between the amount of memory usage and runtime
2. [`n_blks_for_rows`](https://github.com/rapidsai/cuml/blob/branch-0.19/cpp/src/decisiontree/batched-levelalgo/builder_base.cuh#L102) - determines the `gridDim.x` of the `computeSplit` kernels.
Sadly, none of these params are being tuned currently and are just hard-coded to some values! We need to tune these params to achieve optimal perf.

## near short-term tasks
1. Today we are computing the histogram CDF's (in both classification and regression `computeSplit` kernels) by doing shared memory atomics! ([here](https://github.com/rapidsai/cuml/blob/branch-0.19/cpp/src/decisiontree/batched-levelalgo/kernels.cuh#L378) and [here](https://github.com/rapidsai/cuml/blob/branch-0.19/cpp/src/decisiontree/batched-levelalgo/kernels.cuh#L466)). This way of computing CDF's for the purpose of computing metrics will have a lot of atomic bank conflicts! One way to improve perf would be to compute PDF's instead and while computing metrics, we could do a prefix-scan to get the CDF's.
2. Currently, the temporary workspace memory allocation happens for every tree build in the new RF backend. We should move this logic out of the decisiontree folder and into the randomforest and reuse this workspace across different trees being built in the same cuda stream.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRACKER] Improve performance of the new RF backend #3527

low hanging fruits

near short-term tasks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[TRACKER] Improve performance of the new RF backend #3527

Description

low hanging fruits

near short-term tasks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions