Closed
Description
Background
problem
In our Python executor, every executor.run will clone the program and then add feed and fetch op to the cloned program. The following profile demonstrates that the Program.clone is very time-consuming. I add a simple cache to void the program clone, and the result is very impressive.
solution
avoid the clone.
Experiment
profile script
condition
- one card without parallel_do
batch_num | before(s) | after(s) | after/before | before/after |
---|---|---|---|---|
10 | 9.91367912292 | 7.47897624969 | 0.7544 | 1.3255395915090542 |
20 | 19.6153604984 | 14.6058752537 | 0.744 | 1.3429774085898059 |
30 | 28.9721696377 | 21.7348105907 | 0.7501 | 1.332984684490269 |
timeline
before optimize
after optimize
Metadata
Metadata
Assignees
Labels
No labels