any engine for inference subgraph acceleration naive design

# architecture

- frontend to mark the subgraphs that should be optimized by x engine
- inference preparation to get the subgraph, change program desc by replacing the subgraph with an x engine op
- engine op to build an x engine and execute it like a normal operator.

# phrases

## frontend
- manually partition graph
- TODO LATTER automatically partition graph

Some initital ideas, just add some special `with-block` just for inference

```python
a = op0(b, c)

with infer.accelerate_by_tensorrt:
    a1 = op1(b, c)
    a2 = op1(b, c)

c = op2(b, c)

with infer.accelerate_by_some_other_engine:
    a3 = op3(c)
    ...
```


## backend
- inference prepare
  - partition graph and transform the infer program desc
  - tensorrt_engine_op.build
     - convert: input subgraph's block desc, add TensorRT layer into TensorRT engine
         - transform weight format from fluid to tensorrt 
         - add tensorrt layer
- inference execute
  - for op in the infer program desc
     - op.run
 
# x engine
- get a subgraph's block desc (from attribute)
- build x engine once
- execute x engine any times

# x op
- construct x op from x network

# convert
construct x network from a subgraph's block desc






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

any engine for inference subgraph acceleration naive design #10028

architecture

phrases

frontend

backend

x engine

x op

convert

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

any engine for inference subgraph acceleration naive design #10028

Description

architecture

phrases

frontend

backend

x engine

x op

convert

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions