-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kdoctor agent schedule #137
Conversation
Codecov Report
@@ Coverage Diff @@
## main #137 +/- ##
==========================================
- Coverage 40.63% 38.94% -1.69%
==========================================
Files 8 8
Lines 507 529 +22
==========================================
Hits 206 206
- Misses 296 318 +22
Partials 5 5
Flags with carried forward coverage won't be shown. Click here to find out more.
|
1240af1
to
2a64ca0
Compare
Signed-off-by: Icarus9913 <icaruswu66@qq.com>
2a64ca0
to
0ef7d92
Compare
40f446f
to
4e8ebb7
Compare
4e8ebb7
to
87becb5
Compare
87becb5
to
498f1f1
Compare
test/docs/Runtime.md
Outdated
|
||
| Case ID | Title | Priority | Smoke | Status | Other | | ||
|---------|-------------------------------------------------------------------|----------|-------|--------|-------------| | ||
| E00001 | Successfully testing Task Runtime creation | p1 | | | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个描述 可以更细化下,本身就是希望 看描述 知道有了哪些用例
例如,它是建立 哪个 CRD 还是 所有 ? 它是建立 deployment 还是 daemonset ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个用例有补充,还在写 没有提上来
test/docs/Runtime.md
Outdated
|---------|-------------------------------------------------------------------|----------|-------|--------|-------------| | ||
| E00001 | Successfully testing Task Runtime creation | p1 | | | | | ||
| E00002 | Successfully testing Task Runtime Service creation | p1 | | | | | ||
| E00003 | Successfully testing cascading deletion with Task Runtime Service | p1 | | | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, 有没有用例校验 ,包括 资源的创建、删除时间、status 中间状态的装换 等 符合 spec 中的预期 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个用例有补充,还在写 没有提上来
|
||
| 字段 | 描述 | 结构 | 验证 | 取值 | 默认值 | | ||
|-------------------------------|------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------|-----|----------------------|-----------| | ||
| annotation | agent 工作负载的 annotation 配合搭配 multus 多网卡使用 | map[string]string | 可选 | | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
配合搭配 multus 多网卡使用
这个就不要 限定误导
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -4,7 +4,7 @@ | |||
|
|||
## 介绍 | |||
|
|||
对于这种任务,每个 kdoctor agent 都会向指定的目标发送 http 请求,默认并发量为 50 可覆盖多副本情况,并发量可在 kodcotr 的 configmap 中设置,并获得成功率和平均延迟。根据成功条件来判断结果是否成功。并且,可以通过聚合API获取详细的报告。 | |||
对于这种任务, kdoctor-controller 会根据 agentSpec 生成对应的 agent ,每一个 agent pod 都会向指定的目标发送 http 请求,默认并发量为 50 可覆盖多副本情况,并发量可在 kodcotr 的 configmap 中设置,并获得成功率和平均延迟。根据成功条件来判断结果是否成功。并且,可以通过聚合API获取详细的报告。 | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
资源的创建,包括哪些 ? deloyment 和 service
资源的删除逻辑是什么?
报告的收取的逻辑是什么?删除后是否影响 报告 ?deployment 删除和 CR 删除,和 报告保留是什么关系
建议单独用一个 md 说明这些,每个 CRD 中 使用 引用 跳转
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/usage/apphttphealthy-zh_CN.md
Outdated
@@ -67,6 +67,10 @@ kind: AppHttpHealthy | |||
metadata: | |||
name: http1 | |||
spec: | |||
agentSpec: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get-started就是要精简,按默认工作即可 ,不要写这些
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/reference/netdns-zh_CN.md
Outdated
| env | agent 工作负载环境变量 | [env](https://github.com/kubernetes/kubernetes/blob/v1.27.0/staging/src/k8s.io/api/core/v1/types.go#L2012) | 可选 | | | | ||
| hostNetwork | agent 工作负载是否使用宿主机网络 | bool | 可选 | true、false | false | | ||
| resources | agent 工作负载资源使用配置 | [resources](https://github.com/kubernetes/kubernetes/blob/v1.27.0/staging/src/k8s.io/api/core/v1/types.go#L2333) | 可选 | | | | ||
| terminationGracePeriodMinutes | agent 工作负载完成任务后多少分钟之后终止 | int | 可选 | 大于等于 0 | 60 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个默认值 60 在 chart values 中可定制,更合适
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/reference/netdns-zh_CN.md
Outdated
| affinity | agent 工作负载亲和性 | [labelSelector](https://github.com/kubernetes/kubernetes/blob/v1.27.0/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L1195) | 可选 | | | | ||
| env | agent 工作负载环境变量 | [env](https://github.com/kubernetes/kubernetes/blob/v1.27.0/staging/src/k8s.io/api/core/v1/types.go#L2012) | 可选 | | | | ||
| hostNetwork | agent 工作负载是否使用宿主机网络 | bool | 可选 | true、false | false | | ||
| resources | agent 工作负载资源使用配置 | [resources](https://github.com/kubernetes/kubernetes/blob/v1.27.0/staging/src/k8s.io/api/core/v1/types.go#L2333) | 可选 | | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
默认是什么?无资源限制
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
默认cpu:100m.mem: 128Mi
在 chart value 中可以设置
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我指文档 要 补充,不是 comment 中 告知我
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
test/docs/Runtime.md
Outdated
| E00001 | Successfully testing Task Runtime creation | p1 | | | | | ||
| E00002 | Successfully testing Task Runtime Service creation | p1 | | | | | ||
| E00003 | Successfully testing cascading deletion with Task Runtime Service | p1 | | | | | ||
| E00004 | Successfully testing cascading deletion with Task Runtime | p1 | | | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1)这些资源 如果 中途被人 删除了 会发生什么? 业务代码 是否会崩,CRD status 会展示什么,是否 要加入 finalizer 或者 webhook 防范
(2)中途删除 CRD,期待什么,是否有用例
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有用例的
test/docs/Runtime.md
Outdated
| Case ID | Title | Priority | Smoke | Status | Other | | ||
|---------|-------------------------------------------------------------------|----------|-------|--------|-------------| | ||
| E00001 | Successfully testing Task Runtime creation | p1 | | | | | ||
| E00002 | Successfully testing Task Runtime Service creation | p1 | | | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
项目卸载时,卸载流程是否有影响,是否有残余CR或 deployment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
此功能暂时还没实现,等后面在做这个功能吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个很简单,spidernet-io/spiderpool@b5b8919
参考下,一切 做了,或则 要在 文档中 体现用例
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个 pr 里的东西太多了 重新搞个 pr 搞吧
7f0f50e
to
925f202
Compare
Signed-off-by: ii2day <ji.li@daocloud.io>
7935ea4
to
9002712
Compare
cb927e6
to
0738421
Compare
@@ -73,6 +73,11 @@ spec: | |||
- {{ .Values.kdoctorController.cmdBinName }} | |||
args: | |||
- --config-path=/tmp/config-map/conf.yml | |||
- --configmap-deployment-template=/tmp/configmap-app-template/deployment.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个 成本是否有点高,以后 加一个配置 就要写个 命令行参数
这里是否给个 configmap name ,代码 get 自己去读
或者给个路径即可
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
代码量一样,全部扔到cofigmap里,后台代码读取后判断是否有值,然后再json.unmarshal给某个结构体实例,以后加一个配置,一样修改后台代码。
另此处,后台代码已做模版化处理,“验证,读取”全部流程化。
| kind | agent 工作负载的类型 | string | 可选 | Deployment、DaemonSet | DaemonSet | | ||
| deploymentReplicas | agent 工作负载类型为 deployment 时的期望副本数 | int | 可选 | 大于等于 0 | 0 | | ||
| affinity | agent 工作负载亲和性 | [labelSelector](https://github.com/kubernetes/kubernetes/blob/v1.27.0/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L1195) | 可选 | | | | ||
| env | agent 工作负载环境变量 | [env](https://github.com/kubernetes/kubernetes/blob/v1.27.0/staging/src/k8s.io/api/core/v1/types.go#L2012) | 可选 | | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不能贴代码,代码是会变动了,行会变化的
如果需要用户感知,那就需要有个 referent/agent.md 说明启动命令和环境变量等
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我看 spiderpool 中就是这么写的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这一样有版本问题
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我们是小项目,随着代码迭代, 行数变化,我们没有精力 长期 变更 这个链接,也没 CI check 这个链接行数是否正确。
并且 代码也没 对 环境变量的 文字说明,没人看得懂 这些 是什么意思
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0738421
to
ed7cb4e
Compare
Signed-off-by: ii2day <ji.li@daocloud.io>
7a2ba07
to
daab7b3
Compare
之前谈到,务必最好给 agent 设置 limit 资源,避免 影响生产环境 |
docs/reference/runtime-zh_CN.md
Outdated
|
||
### 工作负载 | ||
|
||
工作负载为 DaemonSet 或 Deployment,默认为 Daemonset,负载中的每一个 Pod 根据任务配置进行的请求,并将执行结果落盘到 Pod 中,可通过 AgentSpec 中设置 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
只有所有 pod 就绪了,任务才 开始按照 spec 中的 时间 定义 启动
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/reference/runtime-zh_CN.md
Outdated
的销毁逻辑相同。 | ||
|
||
### Ingress | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1)关于 任务CR 的删除 、资源的优雅删除、报告的删除,三者间是什么关系,是否有个时序图 之类的表达 关系,运维才知道 它的操作 有什么影响
什么时候删除资源,什么时候删除 CR 是安全的
(2)任务的资源优雅删除 是怎么设计的,为什么需要这个
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
daab7b3
to
276d452
Compare
done |
4967054
to
5a453d0
Compare
docs/reference/runtime-zh_CN.md
Outdated
@@ -0,0 +1,59 @@ | |||
## runtime | |||
|
|||
当下发任务 CR 后,kdoctor-controller 会根据 CR 中的 AgentSpec 生成对应的任务载体(DaemonSet 或 Deployment)当所有 Pod 就绪后,开始按照 Spec 中的任务定义执行任务,每一个任务独立使用一个载体。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件在 doc/mkdoc 没有链接
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/reference/runtime-zh_CN.md
Outdated
workload ->>ingress: 到达 runtime 销毁时间,销毁 ingress | ||
cr 任务 ->>kdoctor_controller: cr 任务删除 | ||
kdoctor_controller ->> workload: cr 任务删除,workload 删除 | ||
workload ->> pod: workload 删除,pod 删除 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1)这个时序图中,好像还没 说清 报告的生命周期
(2)这个图的后续,是否可以给几个简单的结论,
报告的什么周期是什么(删除CR 是否意味着 它的报告也会被删除了 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
5a453d0
to
18de65d
Compare
docs/mkdocs.yml
Outdated
@@ -50,6 +50,7 @@ nav: | |||
- AppHttpHealthy: reference/apphttphealthy.md | |||
- NetReach: reference/netreach.md | |||
- NetDns: reference/netdns.md | |||
- Runtime: reference/runtime.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这是 concept 章节 更合适吧?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done 已更改
18de65d
to
2753559
Compare
修复下 ci |
2753559
to
b8e45e8
Compare
Signed-off-by: ii2day <ji.li@daocloud.io>
b8e45e8
to
c5368a7
Compare
No description provided.