Memory optimization on Dynamic RNN #7599

QiJune · 2018-01-17T09:33:22Z

Have tested in machine translation demo which has two hidden layer inside RNN block.

The benchmark result for the first batch is following(by bytes):

Model	Before	After	Saving
Machine translation	525144064	490860544	6.53%

JiayiFeng · 2018-01-23T07:54:20Z

python/paddle/v2/fluid/memory_optimization_transpiler.py

+                    defs_can_optimize)
+                defs_can_optimize = filter(
+                    lambda x: self._find_var(block_desc, x, is_forward).type() == core.VarDesc.VarType.LOD_TENSOR,
+                    defs_can_optimize)


Why not combine these three filters into one?

Because I find that yapf can not format lambda style. It will be

defs_can_optimize = filter( lambda x: str(x) != "@EMPTY@" and self._has_var(block_desc, x, is_forward) and not self._find_var(block_desc, x, is_forward).persistable() and self._find_var(block_desc, x, is_forward).type() == core.VarDesc.VarType.LOD_TENSOR self._defs[i])

It's too looooong!

JiayiFeng · 2018-01-23T08:01:02Z

python/paddle/v2/fluid/memory_optimization_transpiler.py

+                can_optimize)
+            can_optimize = filter(
+                lambda x: self._find_var(block_desc, x, is_forward).type() == core.VarDesc.VarType.LOD_TENSOR,
+                can_optimize)


These three filter can also be combined. And it can share the same filter function with the filter in line 132-134.

JiayiFeng · 2018-01-23T08:13:46Z

python/paddle/v2/fluid/memory_optimization_transpiler.py

+                    for index, cache_pair in enumerate(self.pool):
+                        cache_var = cache_pair[0]
+                        cache_shape = cache_pair[1]
+                        if x_shape == cache_shape:


I think we can divide the optimization into three levels:

level 1: Only reusing variables with the same prod(shape). Perfect reusing, no memory waste or reallocating.

level 2: Reusing variables whose prod(shape) is greater than the required prod(shape). There is no reallocating, but some memory may be wasted. To minimize the waste, the reused variable's prod(shape) should be as close to the required one as possible.

Optimization of level 1 and level 2 are harmless. Enabling them is definitely better than do nothing. They shall always be applied.

level 3 (Optional): Reusing variables even if whose prod(shape) is less than the required one. Obviously, each reusing of this level will result in a reallocating, which may slow training down. So this level is optional. To maximize the reusing efficiency, the reused variable's prod(shape) should be as close to the required one as possible.

The whole optimization logic may be a bit complex. So I think it's better to warp the pool as a class and implement the reusing variable picking up logic as one of its member functions.

However, It's not necessary to complete all of these in the current PR. We can merge it first and keep refining in the future. I'm also glad to take part in the jobs.

Yes. Actually, we can reuse var if the shape is smaller than the var in cache pool. But, the first dim is batch_size, which is -1 in compile time. We can not get the real size in compile time.

@JiayiFeng Thanks for the detailed optimization policy. Sure, we can merge this PR first and you can work on it later.

QiJune added 16 commits January 11, 2018 16:40

limit variable type to lod tensor in memory optimization transpiler

42ec7ff

init

5c4eab9

Merge remote-tracking branch 'baidu/develop' into memory_opt_rnn

62eb0aa

set rnn

808d9d2

refine policy

d5ac674

support while operator

ad26ab5

clean code

3592997

fix code

db362db

init

f73f550

fix bug

63985e0

Merge remote-tracking branch 'baidu/develop' into memory_opt_rnn

06b46c4

fix random seed and training data order

4dbb1ab

clean code

38e0789

Merge remote-tracking branch 'baidu/develop' into memory_opt_rnn

b472acd

fix code

3fb5aa7

fix bug

77fb96e

QiJune requested review from reyoung, lcy-seso and JiayiFeng January 23, 2018 05:00

QiJune changed the title ~~[WIP]Memory optimization on Dynamic RNN~~ Memory optimization on Dynamic RNN Jan 23, 2018

fix conflicts

f1a2792

JiayiFeng reviewed Jan 23, 2018

View reviewed changes

refine get_cfgs method to support multi while operators

8e169f4

JiayiFeng reviewed Jan 23, 2018

View reviewed changes

refine codes

e180fb1

JiayiFeng approved these changes Jan 23, 2018

View reviewed changes

QiJune merged commit d76fcb6 into PaddlePaddle:develop Jan 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory optimization on Dynamic RNN #7599

Memory optimization on Dynamic RNN #7599

QiJune commented Jan 17, 2018 •

edited

Loading

JiayiFeng Jan 23, 2018

QiJune Jan 23, 2018 •

edited

Loading

JiayiFeng Jan 23, 2018

QiJune Jan 23, 2018

JiayiFeng Jan 23, 2018 •

edited

Loading

QiJune Jan 23, 2018

QiJune Jan 23, 2018

Memory optimization on Dynamic RNN #7599

Memory optimization on Dynamic RNN #7599

Conversation

QiJune commented Jan 17, 2018 • edited Loading

JiayiFeng Jan 23, 2018

Choose a reason for hiding this comment

QiJune Jan 23, 2018 • edited Loading

Choose a reason for hiding this comment

JiayiFeng Jan 23, 2018

Choose a reason for hiding this comment

QiJune Jan 23, 2018

Choose a reason for hiding this comment

JiayiFeng Jan 23, 2018 • edited Loading

Choose a reason for hiding this comment

QiJune Jan 23, 2018

Choose a reason for hiding this comment

QiJune Jan 23, 2018

Choose a reason for hiding this comment

QiJune commented Jan 17, 2018 •

edited

Loading

QiJune Jan 23, 2018 •

edited

Loading

JiayiFeng Jan 23, 2018 •

edited

Loading