Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

服务掉线后重新注册逻辑有误,可能导致服务丢失 #736

Closed
ijustyce opened this issue Mar 29, 2024 · 0 comments
Closed

服务掉线后重新注册逻辑有误,可能导致服务丢失 #736

ijustyce opened this issue Mar 29, 2024 · 0 comments

Comments

@ijustyce
Copy link
Contributor

假设连接断开,go sdk 会重新建连,建连后调用 r.notifyConnectionChange(CONNECTED) 触发服务重新注册;
重新注册如果失败,会调用 r.switchServerAsync(ServerInfo{}, true) ,最终调用到 reconnect(serverInfo ServerInfo, onRequestFail bool),此时,如果健康检查通过,则不会调用 r.notifyConnectionChange(CONNECTED) 进而导致服务丢失;

该问题,我们生产环境真实发生,查阅代码,发现存在该风险,通过压测,发现注册 5 万 client 并重启 nacos server 后,无法恢复到 5 万,总是缺失几十个,这里不仅 client 有bug,nacos server 也有服务注册失败,吞掉异常,返回注册成功的 bug
简单点,建议做如下变更:
图片
新增 313 行,进而触发服务再次注册;如果健康检查失败,触发事件,再次 reconnect 前,发现健康检查通过,也将重新注册,如果需要优化这个,可加下判断,不过重新注册并没有啥问题。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant