graph_net_bench 重构方案 #607

JewelRoam · 2026-01-26T09:13:04Z

JewelRoam
Jan 26, 2026
Collaborator

概念	提供者	职责
Backend	用户实现	加载模型、编译、单次推理
Runner	框架提供	决定在哪里执行（Local 直接调用 / Remote 发送到远程）
eval_backend_diff	框架提供	warmup/trials 循环、计时、同步、统计、比较

目标

Backend 只实现一份：专注于单次执行的编译/运行逻辑，无需为 Local/Remote 写两份代码
Runner 由框架提供：LocalRunner 和 RemoteRunner 自动包装任意 Backend，用户无需实现
低适配成本：新增 Backend 只需实现 execute() 方法，无需关心执行环境、循环控制、统计聚合
职责分离：Backend = 怎么编译执行 / Runner = 在哪里执行 / 框架 = 循环+统计+比较
开闭原则：新增 Backend 不需要修改 Runner 或框架代码
Remote 自动包装：任意 Backend 只需改配置即可远程执行，无需额外适配
动态加载：通过 backend_path + backend_class 动态加载，无需硬编码注册
指标自主权：Backend 自主记录细分指标（加载/编译/执行时间等），框架只负责收集聚合，metrics 约束为标量以确保 Remote 传输轻量

配置格式

python3 -m graph_net_bench.torch.eval_backend_diff \
    --model-path <str> \
    --model-path-list <str> \
    --reference-config <base64_json> \
    --target-config <base64_json>

Local 模式配置：

{
    "runner_type": "local",
    "backend_path": "graph_net_bench.torch.backends.inductor",
    "backend_class": "InductorBackend",
    "warmup": 3,
    "trials": 5,
    "backend_config": {
        "seed": 123,
        "device": "cuda",
        "log_prompt": "graph-net-bench-log",
        "model_path_prefix": "/path/to/models"
    }
}

Remote 模式配置（只需改 runner_type 并加远程地址）：

{
    "runner_type": "remote",
    "remote_machine": "10.0.0.1",
    "remote_port": 50052,
    "backend_path": "graph_net_bench.torch.backends.inductor",
    "backend_class": "InductorBackend",
    "warmup": 3,
    "trials": 5,
    "backend_config": { ... }
}

Backend

class BaseBackend(ABC):    
    def __init__(self, model_path: str, config: dict):
        pass
    
    @abstractmethod
    def execute(self) -> dict:
        """单次推理（包含设备同步）
        
        Returns:
            {
                "outputs": Any,              # 必须，模型输出
                "metrics": {
                    "execute_time_ms": float # 必须，包含 synchronize 时间
                }
            }
        """
        pass
    
    def warmup(self, num_warmup: int) -> None:
        for _ in range(num_warmup):
            self.execute()

    def cleanup(self) -> None:
        pass

Runner

LocalRunner：直接在当前进程实例化并执行 Backend
RemoteRunner：将完整 benchmark 发送到远程服务器执行，返回聚合结果

结构

graph_net_bench/torch/
├── eval_backend_diff.py          # entrance
├── runners/
│   ├── base_runner.py
│   ├── local_runner.py 
│   └── remote_runner.py
├── backends/
│   ├── base_backend.py
│   ├── torch_eager.py
│   ├── inductor.py
│   ├── custom.py
│   └── ...
└── utils/
    ├── timing.py
    ├── comparison.py
    └── ...

fangfangssj · 2026-01-27T02:39:39Z

fangfangssj
Jan 27, 2026

自定义的后端my_backend为需要实现local和remote两个部分，其中二者还需要通信嘛
eval_backend_diff的设计是两个不同的backend来比较的话，直接采用单个runner在不同的backend跑一次，对比二者之间的差异，这样的话能减少backend接入的复杂程度，backend只需要专注于单次实现即可，其实这样runner和backend就等价了，是一个东西了

0 replies

roll-away · 2026-01-27T03:47:52Z

roll-away
Jan 27, 2026

我的理解是，Backend主要关注单次执行的编译/运行/指标产出，而Runner负责执行环境和调度方式（local/remote等）
在这种拆分下，需要要求每个backend都要有local和remote两种实现吗
能不能由Runner提供默认的remote包装，backend只实现核心执行逻辑就行？

0 replies

JewelRoam · 2026-01-27T06:05:38Z

JewelRoam
Jan 27, 2026
Collaborator Author

感谢 @fangfangssj @roll-away 的提议和补充，我已经修改了方案描述。请review新的方案，并提出你们的想法～

0 replies

JewelRoam · 2026-01-27T06:18:13Z

JewelRoam
Jan 27, 2026
Collaborator Author

根据这个想法，需要做一些改造，具体为如下几步：

基于eval_backend_diff拓展出runner：local、remote @JewelRoam
改造出新的eval_backend_diff，并且为eval_backend_diff、runners都撰写单测 @JewelRoam
基于eval_backend_perf（另参照test_compiler），拆分出utils：timing、comparison，大致整理为TorchBackend @roll-away （@JewelRoam 来优化来自原test_compiler不规范的代码）
根据单测端到端调试runner与TorchBackend，确保协议有效（这个过程需要引入小批量模型测试，排查可能的问题） @fangfangssj
确定BaseBackend基类，将TorchBackend拆分为多个Backend，根据语义命名 @fangfangssj （@JewelRoam 来完成基类）
清理过时的eval_backend_perf、测评和torch相关的utils等

@Dayuxiaoshui @ywh555hhh 来协助梳理改造后的价值点，例如：
拓展覆盖了哪些用户的需求？如何高效适配产出软件栈观测报告？

其中：
1、3无依赖关系；
4、5依赖1、2、3，可同时进行；
6最后开展。

0 replies

JewelRoam · 2026-01-27T07:09:31Z

JewelRoam
Jan 27, 2026
Collaborator Author

ADT 形式化定义

# 主流程
LocalRunBenchmark :=
  BenchmarkResult
  <- $model_path str
  <- BackendSpec
  <- $make_backend (Backend <- BackendSpec)
  <- $make_model ((torch.nn.Module * Inputs) <- $model_path str)
  <- $warmup (() <- Backend <- Int)
  <- $benchmark (BenchmarkResult <- torch.nn.Module <- Inputs)

# 配置参数
BackendSpec :=
  Object
  * $runner_type    ("local" | "remote")
  * $backend_path   str
  * $backend_class  str
  * $remote_machine ([str] | ())
  * $remote_port    ([int] | ())
  * $warmup         int
  * $trials         int
  * $backend_config Dict

# Runner 
RunnerType := "local" | "remote"

# Backend 协议
Backend :=
  Interface
  * $init     (() <- str <- Config)
  * $execute  (ExecuteResult <- Model)
  * $warmup   (() <- Model <- int)
  * $cleanup  (() <- Model)

# 执行结果
ExecuteResult :=
  Object
  * $outputs Any
  * $metrics Metrics

# 性能指标
Metrics :=
  Object
  * $execute_time_ms Float
  * $extra           ([Map<String * Scalar>] | ())

0 replies

JewelRoam · 2026-01-28T10:55:16Z

JewelRoam
Jan 28, 2026
Collaborator Author

由于warmup和trials需要在runner中控制，故backend需提供warmup和execute方法用于多次调用。已更新

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graph_net_bench 重构方案 #607

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 6 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

graph_net_bench 重构方案 #607

Uh oh!

Uh oh!

JewelRoam Jan 26, 2026 Collaborator

目标

配置格式

Backend

Runner

结构

Replies: 6 comments

Uh oh!

fangfangssj Jan 27, 2026

Uh oh!

roll-away Jan 27, 2026

Uh oh!

JewelRoam Jan 27, 2026 Collaborator Author

Uh oh!

Uh oh!

JewelRoam Jan 27, 2026 Collaborator Author

Uh oh!

Uh oh!

JewelRoam Jan 27, 2026 Collaborator Author

ADT 形式化定义

Uh oh!

JewelRoam Jan 28, 2026 Collaborator Author

JewelRoam
Jan 26, 2026
Collaborator

fangfangssj
Jan 27, 2026

roll-away
Jan 27, 2026

JewelRoam
Jan 27, 2026
Collaborator Author

JewelRoam
Jan 27, 2026
Collaborator Author

JewelRoam
Jan 27, 2026
Collaborator Author

JewelRoam
Jan 28, 2026
Collaborator Author