[FEA] Support for other ways to do MNMG-RF

**Is your feature request related to a problem? Please describe.**
Current MNMG RF is more like a model-parallel approach. We distribute the data among the workers and also distribute the work of building separate trees on each of them. Each worker then builds a tree based on *only* the data that is available to it. 

Although, this is an embarrassingly parallel approach to build trees in RF. This approach, however, can have some limitations:
1. does not work well if the dataset is wide (aka lots of features).
2. tree built on a particular worker may not see samples from other workers, which could introduce bias

**Describe the solution you'd like**
Along with the current approach, we should also be providing an option for users to choose another approach, whose solution is:
1. If the rows of the dataset are distributed across the workers, then we need to perform an allReduce of the intermediate histograms among those workers, before computing the best split.
2. If the columns of the dataset are distributed across the workers, then we need to perform a max-allReduce of the individual best-splits among those workers to get the “global” best split.
3. If both rows and columns are distributed (aka 2D-partitioning of the dataset), then we need to do both 1 and 2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Support for other ways to do MNMG-RF #3539

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] Support for other ways to do MNMG-RF #3539

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions