Distributed sklearn kmeans based on ray and modin #429

PivovarA · 2020-12-17T09:33:19Z

Added one_ccl transceiver.
Rewrote dist_kmeans.h. The spark sample and daal4py used map_tree, but it is based on send and receive operations, which oneccl not support yet. oneDAL KMeans MPI sample didn't use map_tree, so I used it as a basis.
Added ray_partition_actor and ray context. At the moment ray_context is basic and serves more to extract some information from the ray cluster. This functionality is planned to be expanded in the future.
Added to build.sh the ability to work with oneccl. At the moment, for correct operation, you need a oneccl build: https://github.com/oneapi-src/oneCCL/tree/2021.1-beta07-1
And also set the variable CCLROOT. In addition, it needs to use oneccl_transceiver:
export D4P_TRANSCEIVER = oneccl_transceiver

SmirnovEgorRu · 2020-12-17T15:24:42Z

daal4py/sklearn/cluster/distributed_k_means_fit.py

+import modin.pandas as pd
+
+
+@ray.remote(num_cpus=1)


What about modin on dask?

Unwrapping partitions of a DataFrame works for dask engine as well. Some changes are needed to do here. It looks like it is feasible. Is it planned to be as part of this PR or another?

fschlimb · 2020-12-18T09:31:17Z

src/dist_kmeans.h

+using namespace std;
+using namespace daal;
+using namespace daal::algorithms;
+using namespace daal::data_management;


Please do not use multiple and nested namespaces. Spell them out where used.

fschlimb · 2020-12-18T09:33:06Z

src/dist_kmeans.h

-            // reduce all partial results
-            auto pres = map_reduce_tree::map_reduce_tree<Algo>::reduce(algo, s1_result);
-            // finalize and check convergence/end of iteration
+            auto res = tcvr->gather(s1_result);


👎 Tree-reduce is more efficient.

fschlimb · 2020-12-18T09:36:55Z

daal4py/engines/row_partition_actor.py

+#
+#*******************************************************************************
+# Copyright 2014-2020 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#******************************************************************************/
+
+import pandas
+
+class RowPartitionsActor:
+    def __init__(self, node):
+        self.row_parts_ = []
+        self.node_ = node
+
+    def set_row_parts(self, *row_parts):
+        self.row_parts_ = pandas.concat(list(row_parts), axis=0)
+
+    def set_row_parts_list(self, *row_parts_list):
+        self.row_parts_ = list(row_parts_list)
+
+    def append_row_part(self, row_part):
+        self.row_parts_.append(row_part)
+
+    def concat_row_parts(self):
+        self.row_parts_ = pandas.concat(self.row_parts_, axis=0)
+
+    def get_row_parts(self):
+        return self.row_parts_
+
+    def get_actor_ip(self):
+        return self.node_


Why is this here and in ray?

fschlimb · 2020-12-18T09:40:30Z

daal4py/engines/ray/row_partition_actor.py

+# See the License for the specific language governing permissions and
+# limitations under the License.
+#******************************************************************************/
+


Pls add a brief description of what this file contains and how this works. 2.3 lines suffice.

fschlimb · 2020-12-18T09:40:37Z

daal4py/engines/ray/ray_context.py

+# See the License for the specific language governing permissions and
+# limitations under the License.
+#******************************************************************************/
+


Pls add a brief description of what this file contains and how this works. 2.3 lines suffice.

fschlimb · 2020-12-18T09:40:48Z

daal4py/engines/context.py

+# See the License for the specific language governing permissions and
+# limitations under the License.
+#******************************************************************************/
+


Pls add a brief description of what this file contains and how this works. 2.3 lines suffice.
Pls also add docstrings.

fschlimb

All the actor and modin code should be pushed down to daal4py itself. I see no reason why this should only be available for sklearn. When in daal4py itself, sklearn is automatically supported.
Also, a merge requires a complete CCL transceiver.
With the above we'd have a support for modin in all distributed algos.

prutskov · 2021-02-15T10:25:06Z

daal4py/sklearn/cluster/distributed/dask/distributed_k_means_fit.py

+    row_parts_last_idx = (
+        len(row_partitions) // num_nodes
+        if len(row_partitions) % num_nodes == 0
+        else len(row_partitions) // num_nodes + 1
+    )


Suggested change

row_parts_last_idx = (

len(row_partitions) // num_nodes

if len(row_partitions) % num_nodes == 0

else len(row_partitions) // num_nodes + 1

)

row_parts_last_idx = (

len(row_partitions) // num_nodes

if len(row_partitions) % num_nodes < num_nodes // 2 + 1

else len(row_partitions) // num_nodes + 1

)

In case .. == 0 situation is possible, when nodes will be empty (11 partitions and 5 actors, for example).

prutskov · 2021-02-15T10:30:20Z

daal4py/sklearn/cluster/distributed/dask/distributed_k_means_fit.py

+    for actor in actors:
+        actor.set_row_parts(
+            [r.result() for r in row_partitions[
+                slice(i, i + row_parts_last_idx)
+                if i + row_parts_last_idx < len(row_partitions)
+                else slice(i, len(row_partitions))
+            ]]
+        )
+        i += row_parts_last_idx


Need to change slices according to comment above.

prutskov · 2021-02-15T10:31:21Z

daal4py/sklearn/cluster/distributed/ray/distributed_k_means_fit.py

+    row_parts_last_idx = (
+        len(row_partitions) // num_nodes
+        if len(row_partitions) % num_nodes == 0
+        else len(row_partitions) // num_nodes + 1
+    )
+
+    i = 0
+    for actor in actors:
+        actor.set_row_parts._remote(
+            args=(
+            row_partitions[
+                slice(i, i + row_parts_last_idx)
+                if i + row_parts_last_idx < len(row_partitions)
+                else slice(i, len(row_partitions))
+            ])
+        )
+        i += row_parts_last_idx


The same as for dask.

fschlimb · 2021-02-16T12:45:19Z

daal4py/sklearn/cluster/distributed/dask/distributed_k_means_fit.py

+        actors.append(dask_client.submit(RowPartitionsActor, None, workers=set([ip]), actor=True))
+    actors = [actor.result() for actor in actors]
+
+    row_partitions = unwrap_row_partitions(X)


...(X, axis=0)

fschlimb · 2021-02-16T12:46:09Z

daal4py/sklearn/cluster/distributed/ray/distributed_k_means_fit.py

+        actors
+    ), f"number of nodes {num_nodes} is not equal to number of actors {len(actors)}"
+
+    row_partitions = unwrap_row_partitions(X)


...(X, axis=0)

unwrap_row_partitions is old and not actual function. It should be replaced to unwrap_partitions.

ethanglaser · 2024-03-11T16:37:11Z

Out of date with repository - if continued work on this is desired a new ticket+PR can be opened

PivovarA added 2 commits December 17, 2020 12:17

Add Distributed ray KMeans POC

4395ef6

undo daal4py.cpp changes

15d4669

PivovarA requested review from PetrovKP, SmirnovEgorRu and YarShev December 17, 2020 09:33

PivovarA marked this pull request as draft December 17, 2020 09:57

PetrovKP added the distributed label Dec 17, 2020

PivovarA added 2 commits December 17, 2020 14:31

Fix code issues

ff65d25

fix daal4py build issue

168d2f0

SmirnovEgorRu reviewed Dec 17, 2020

View reviewed changes

fschlimb suggested changes Dec 18, 2020

View reviewed changes

PivovarA and others added 3 commits December 22, 2020 11:36

Fix empty actor issue

8f1c411

Remove row_partition_actor from ray dir and remove nested namespaces

1cd7960

Add dask backend support

448f96a

prutskov reviewed Feb 15, 2021

View reviewed changes

fschlimb reviewed Feb 16, 2021

View reviewed changes

YarShev mentioned this pull request May 14, 2021

__partitioned__ protocol for partitioned and distributed data containers IntelPython/DPPY-Spec#3

Open

samir-nasibli self-assigned this Apr 19, 2023

ethanglaser closed this Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed sklearn kmeans based on ray and modin #429

Distributed sklearn kmeans based on ray and modin #429

PivovarA commented Dec 17, 2020

SmirnovEgorRu Dec 17, 2020

YarShev Jan 12, 2021

fschlimb Dec 18, 2020

fschlimb Dec 18, 2020

fschlimb Dec 18, 2020 •

edited

Loading

fschlimb Dec 18, 2020

fschlimb Dec 18, 2020

fschlimb Dec 18, 2020

fschlimb left a comment

prutskov Feb 15, 2021

prutskov Feb 15, 2021

prutskov Feb 15, 2021

fschlimb Feb 16, 2021

fschlimb Feb 16, 2021

YarShev Feb 16, 2021 •

edited

Loading

ethanglaser commented Mar 11, 2024

		import modin.pandas as pd


		@ray.remote(num_cpus=1)

Distributed sklearn kmeans based on ray and modin #429

Distributed sklearn kmeans based on ray and modin #429

Conversation

PivovarA commented Dec 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fschlimb Dec 18, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fschlimb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YarShev Feb 16, 2021 • edited Loading

Choose a reason for hiding this comment

ethanglaser commented Mar 11, 2024

fschlimb Dec 18, 2020 •

edited

Loading

YarShev Feb 16, 2021 •

edited

Loading