lain ps 的 healthy 状态显示不准确

有时，如果 container 运行的过程中自己挂掉，`lain ps` 无法马上获取 container 的正确状态。

因 deployd 内部逻辑的不太严谨，无法马上获知 container 状态，具体情况如下：
1. deployd 启动 container 成功，认为是 health 的。
2. container 挂掉，但 `lain ps` 显示是 health 的。
3. 90s 后 deployd 巡检，发现 container 挂掉，尝试拉起，成功，所以还是 health。
4. container 再次挂掉，`lain ps` 还是 health 的。

临时改进方法:
1. 完善 deployd 巡检逻辑，缩短巡检 interval，尝试拉起 3 次后认定 container 是 unhealth的，就不再管了。
2. deployd 在 handle http 请求时，都从 swarm 同步最新状态（需控制频率，如最快1s 同步一次）。这里只同步状态信息，不对 container 做重启的操作。  


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lain ps 的 healthy 状态显示不准确 #64

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

lain ps 的 healthy 状态显示不准确 #64

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions