Skip to content

any engine for inference subgraph acceleration naive design #10028

Closed
@Superjomn

Description

@Superjomn

architecture

  • frontend to mark the subgraphs that should be optimized by x engine
  • inference preparation to get the subgraph, change program desc by replacing the subgraph with an x engine op
  • engine op to build an x engine and execute it like a normal operator.

phrases

frontend

  • manually partition graph
  • TODO LATTER automatically partition graph

Some initital ideas, just add some special with-block just for inference

a = op0(b, c)

with infer.accelerate_by_tensorrt:
    a1 = op1(b, c)
    a2 = op1(b, c)

c = op2(b, c)

with infer.accelerate_by_some_other_engine:
    a3 = op3(c)
    ...

backend

  • inference prepare
    • partition graph and transform the infer program desc
    • tensorrt_engine_op.build
      • convert: input subgraph's block desc, add TensorRT layer into TensorRT engine
        • transform weight format from fluid to tensorrt
        • add tensorrt layer
  • inference execute
    • for op in the infer program desc
      • op.run

x engine

  • get a subgraph's block desc (from attribute)
  • build x engine once
  • execute x engine any times

x op

  • construct x op from x network

convert

construct x network from a subgraph's block desc

Metadata

Metadata

Labels

预测原名Inference,包含Capi预测问题等

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions