Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sml] add kmeans++ and support executing with multiple initial centers in Kmeans #546

Merged
merged 25 commits into from
Feb 21, 2024

Conversation

winnylyc
Copy link
Contributor

@winnylyc winnylyc commented Feb 7, 2024

What problem does this PR solve?

Realize some functions of Kmeans.
Add kmeans++ for center initialization.
Support executing Kmens with multiple initial centers and using the best result.

@winnylyc
Copy link
Contributor Author

winnylyc commented Feb 7, 2024

When running bazel run -c opt //sml/cluster/emulations:kmeans_emul, it can have the expected result. However, there are some error messages shown below.

[2024-02-07 11:00:23,154] [Process-25] Starting grpc server at 127.0.0.1:61924
E0207 11:00:23.154733795   27992 chttp2_server.cc:1051]                UNKNOWN:No address added out of total 1 resolved for '127.0.0.1:61924' {created_time:"2024-02-07T11:00:23.154586799+00:00", children:[UNKNOWN:Unable to configure socket {created_time:"2024-02-07T11:00:23.15453706+00:00", fd:16, children:[UNKNOWN:Address already in use {syscall:"bind", os_error:"Address already in use", errno:98, created_time:"2024-02-07T11:00:23.15450766+00:00"}]}]}
Process Process-25:
Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.10/site-packages/multiprocess/process.py", line 314, in _bootstrap
    self.run()
  File "/root/miniconda3/lib/python3.10/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/.cache/bazel/_bazel_root/c7c2256833a99c4ceaf0534f480b1c44/execroot/spulib/bazel-out/k8-opt/bin/sml/cluster/emulations/kmeans_emul.runfiles/spulib/spu/utils/distributed.py", line 208, in serve
    server.add_insecure_port(nodes_def[node_id])
  File "/root/miniconda3/lib/python3.10/site-packages/grpc/_server.py", line 1329, in add_insecure_port
    return _common.validate_port_binding_result(
  File "/root/miniconda3/lib/python3.10/site-packages/grpc/_common.py", line 181, in validate_port_binding_result
    raise RuntimeError(_ERROR_MESSAGE_PORT_BINDING_FAILED % address)
RuntimeError: Failed to bind to address 127.0.0.1:61924; set GRPC_VERBOSITY=debug environment variable to see detailed error message.

@anakinxc anakinxc requested a review from deadlywing February 7, 2024 11:08
@deadlywing
Copy link
Contributor

When running bazel run -c opt //sml/cluster/emulations:kmeans_emul, it can have the expected result. However, there are some error messages shown below.

[2024-02-07 11:00:23,154] [Process-25] Starting grpc server at 127.0.0.1:61924
E0207 11:00:23.154733795   27992 chttp2_server.cc:1051]                UNKNOWN:No address added out of total 1 resolved for '127.0.0.1:61924' {created_time:"2024-02-07T11:00:23.154586799+00:00", children:[UNKNOWN:Unable to configure socket {created_time:"2024-02-07T11:00:23.15453706+00:00", fd:16, children:[UNKNOWN:Address already in use {syscall:"bind", os_error:"Address already in use", errno:98, created_time:"2024-02-07T11:00:23.15450766+00:00"}]}]}
Process Process-25:
Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.10/site-packages/multiprocess/process.py", line 314, in _bootstrap
    self.run()
  File "/root/miniconda3/lib/python3.10/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/.cache/bazel/_bazel_root/c7c2256833a99c4ceaf0534f480b1c44/execroot/spulib/bazel-out/k8-opt/bin/sml/cluster/emulations/kmeans_emul.runfiles/spulib/spu/utils/distributed.py", line 208, in serve
    server.add_insecure_port(nodes_def[node_id])
  File "/root/miniconda3/lib/python3.10/site-packages/grpc/_server.py", line 1329, in add_insecure_port
    return _common.validate_port_binding_result(
  File "/root/miniconda3/lib/python3.10/site-packages/grpc/_common.py", line 181, in validate_port_binding_result
    raise RuntimeError(_ERROR_MESSAGE_PORT_BINDING_FAILED % address)
RuntimeError: Failed to bind to address 127.0.0.1:61924; set GRPC_VERBOSITY=debug environment variable to see detailed error message.

hello,thanks for your excellent contributions first!

The reason of this error is that you initialize emulator in every emul_xxx function which leading to ports conflict. So you can just initialize an emulator in global scope once (so is the emulator.down()), and reuse it in all emul_xxx functions.

You can refer to this.

(BTW, I may give some detailed reviews after Spring Festival -.-)

@winnylyc
Copy link
Contributor Author

winnylyc commented Feb 8, 2024

Thank you for your help. The problem has been solved!
祝您还有SPU team新年快乐!

@deadlywing
Copy link
Contributor

整体实现都非常nice,没什么问题~

@winnylyc
Copy link
Contributor Author

感谢您的建议!我在实现的时候遇到一个小问题,就是尝试在__init__里面生成init_params,但是生成出来的数字完全不在[0, 1]这个范围,拿unittest里面的sample举例,生成出来的self.init_params为:
[[1.1166600e+11 4.1204851e+10 1.3540787e+11]
[6.2727987e+09 1.1596410e+11 1.3233761e+11]
[3.2618570e+10 2.1491098e+10 8.6244360e+10]]。
我想请问一下这个是什么情况,明明jax.random.choice能用,为什么jax.random.uniform无法正常运行?

@deadlywing
Copy link
Contributor

感谢您的建议!我在实现的时候遇到一个小问题,就是尝试在__init__里面生成init_params,但是生成出来的数字完全不在[0, 1]这个范围,拿unittest里面的sample举例,生成出来的self.init_params为: [[1.1166600e+11 4.1204851e+10 1.3540787e+11] [6.2727987e+09 1.1596410e+11 1.3233761e+11] [3.2618570e+10 2.1491098e+10 8.6244360e+10]]。 我想请问一下这个是什么情况,明明jax.random.choice能用,为什么jax.random.uniform无法正常运行?

sorry,,没有发现这个问题,,事实上,choice也是不能使用的(或者说不正确的),如:

x = np.array([1, 0, 2, 3, 1, 1, 1, 1, 1, 1])
fn = lambda x: jax.random.choice(jax.random.PRNGKey(1), x)
spu_fn = ppsim.sim_jax(sim, fn, copts=copts)
z = spu_fn(x)
print(f"spu out = {z}")     # -2147483648
print(f"cpu out = {fn(x)}")  # 1

但是由于index out of range不会报错,所以我估计程序还是能正常跑下来。

这个问题主要是由于spu暂时没有hack jax的随机数模块的api,针对这个问题,一个可以缓解的方式就是限制一下用法:

def emul_kmeans_kmeans_plus_plus(mode: emulation.Mode.MULTIPROCESS):
    X = jnp.array([[-4, -3, -2, -1], [-4, -3, -2, -1]]).T

    # define model in outer scope
    # then __init__ will be computed in plaintext 
    model = KMEANS(
        n_clusters=4,
        n_samples=X.shape[0],
        init="k-means++",
        init_params=None,
        n_init=1,
        max_iter=10,
    )

    def proc(x):
        # only run fit in crypto
        model.fit(x)
        return model._centers.sort(axis=0)

    X = emulator.seal(X)
    result = emulator.run(proc)(X)
    print("result\n", result)

同时修改kmeans.py
image

@deadlywing
Copy link
Contributor

事实上,,您原来的写法是把kmeans对象定义在SPU的runtime里,所以即使随机数定义在__init__中,也无法正常计算~

@winnylyc
Copy link
Contributor Author

非常感谢老师的建议,纠正了我之前错误的理解(我之前以为SPU会自动将__init__中的运行改为在SPU runtime前执行😓)

@winnylyc
Copy link
Contributor Author

目前将随机数的生成全部改为在init函数中生成init_params这个参数删除,和sklearn保持一。在test和emulation中,random和kmeans++的测试都改为在SPU runtime前初始化model。

@winnylyc
Copy link
Contributor Author

这black报的问题,似乎并不是我这一块代码引起的?

@anakinxc
Copy link
Collaborator

这black报的问题,似乎并不是我这一块代码引起的?

麻烦 merge 一下 main :P

@winnylyc
Copy link
Contributor Author

这black报的问题,似乎并不是我这一块代码引起的?

麻烦 merge 一下 main :P

感谢,问题解决了

Copy link
Contributor

@deadlywing deadlywing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@deadlywing deadlywing merged commit 249026e into secretflow:main Feb 21, 2024
8 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Feb 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants