Replies: 6 comments
-
|
自定义的后端my_backend为需要实现local和remote两个部分,其中二者还需要通信嘛 |
Beta Was this translation helpful? Give feedback.
-
|
我的理解是,Backend主要关注单次执行的编译/运行/指标产出,而Runner负责执行环境和调度方式(local/remote等) |
Beta Was this translation helpful? Give feedback.
-
|
感谢 @fangfangssj @roll-away 的提议和补充,我已经修改了方案描述。请review新的方案,并提出你们的想法~ |
Beta Was this translation helpful? Give feedback.
-
|
根据这个想法,需要做一些改造,具体为如下几步:
@Dayuxiaoshui @ywh555hhh 来协助梳理改造后的价值点,例如: 其中: |
Beta Was this translation helpful? Give feedback.
-
ADT 形式化定义# 主流程
LocalRunBenchmark :=
BenchmarkResult
<- $model_path str
<- BackendSpec
<- $make_backend (Backend <- BackendSpec)
<- $make_model ((torch.nn.Module * Inputs) <- $model_path str)
<- $warmup (() <- Backend <- Int)
<- $benchmark (BenchmarkResult <- torch.nn.Module <- Inputs)
# 配置参数
BackendSpec :=
Object
* $runner_type ("local" | "remote")
* $backend_path str
* $backend_class str
* $remote_machine ([str] | ())
* $remote_port ([int] | ())
* $warmup int
* $trials int
* $backend_config Dict
# Runner
RunnerType := "local" | "remote"
# Backend 协议
Backend :=
Interface
* $init (() <- str <- Config)
* $execute (ExecuteResult <- Model)
* $warmup (() <- Model <- int)
* $cleanup (() <- Model)
# 执行结果
ExecuteResult :=
Object
* $outputs Any
* $metrics Metrics
# 性能指标
Metrics :=
Object
* $execute_time_ms Float
* $extra ([Map<String * Scalar>] | ()) |
Beta Was this translation helpful? Give feedback.
-
|
由于warmup和trials需要在runner中控制,故backend需提供warmup和execute方法用于多次调用。已更新 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
目标
execute()方法,无需关心执行环境、循环控制、统计聚合backend_path+backend_class动态加载,无需硬编码注册配置格式
python3 -m graph_net_bench.torch.eval_backend_diff \ --model-path <str> \ --model-path-list <str> \ --reference-config <base64_json> \ --target-config <base64_json>Local 模式配置:
{ "runner_type": "local", "backend_path": "graph_net_bench.torch.backends.inductor", "backend_class": "InductorBackend", "warmup": 3, "trials": 5, "backend_config": { "seed": 123, "device": "cuda", "log_prompt": "graph-net-bench-log", "model_path_prefix": "/path/to/models" } }Remote 模式配置(只需改 runner_type 并加远程地址):
{ "runner_type": "remote", "remote_machine": "10.0.0.1", "remote_port": 50052, "backend_path": "graph_net_bench.torch.backends.inductor", "backend_class": "InductorBackend", "warmup": 3, "trials": 5, "backend_config": { ... } }Backend
Runner
结构
Beta Was this translation helpful? Give feedback.
All reactions