Skip to content

Conversation

@hbwx24
Copy link
Contributor

@hbwx24 hbwx24 commented Dec 29, 2020

PR types

Bug fixes

PR changes

APIs

Describe

paddle.savepaddle.static.save 存储模型,单个参数超过4 GiB时报错:
image

一、PR主要修改
1.将元素个数超过2^22的参数拆分,拆成元素个数不超过2^22的子向量。

2.在用pickle.dump保存的字典中会增加被拆分的子向量和拆分后的信息:

{'UnpackBigParamInfor@@':  {'_l.weight': {'OriginShape': (10, 67108864), 'slices': ['_l.weight@@.0', '_l.weight@@.1'...] }, 
 '_l.weight@@.0':numpy data, 
 '_l.weight@@.1':numpy data,...}

'UnpackBigParamInfor@@':拆分信息的名字;
'_l.weight':被拆分的参数的名;
OriginShape:'_l.weight'的shape;
slices:被拆分后的的子向量名,例如:'_l.weight'被拆分成了'_l.weight@@.0''_l.weight@@.1'......
'_l.weight@@.0':子向量的名字。

3.load的时候利用拆分后的信息还原被拆分的向量。

二、Example:
1.paddle.static.save/load:

import paddle
import paddle.static as static
import numpy as np

paddle.enable_static()
W = 2**15
x = static.data(name="x", shape=[None, W], dtype='float32')
y = static.nn.fc(x, W)
z = static.nn.fc(y, 10)

place = paddle.CPUPlace()
exe = static.Executor(place)
exe.run(static.default_startup_program())
prog = static.default_main_program()

inputs = np.random.randn(2, W).astype("float32")
result_z = exe.run(program=prog, feed={"x": inputs}, fetch_list=[z.name])

static.save(prog, "./big/temp")
static.load(prog, "./big/temp")
result_load = exe.run(program=prog, feed={"x": inputs}, fetch_list=[z.name])

print(np.sum(np.abs(result_z[0] - result_load[0])))

2.paddle.save/load:

import paddle
import numpy as np

W = 2**15
N = 2**1


class Layer(paddle.nn.Layer):
    def __init__(self):
        super(Layer, self).__init__()
        for i in range(N):
            setattr(self, "l_" + str(i), paddle.nn.Linear(W, W))

    def forward(self, x):
        for i in range(N):
            x = getattr(self, "l_" + str(i))(x)
        return x


layer = Layer()

save_dict = layer.state_dict()
print(len(save_dict))
path = "big/linear" + ".pdparams"
paddle.save(layer.state_dict(), path)
dict_load = paddle.load(path)

for key, value in save_dict.items():
    print("{} dev:{}".format(key,
                             np.sum(np.abs(dict_load[key] - value.numpy()))))

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot-old
Copy link

paddle-bot-old bot commented Dec 29, 2020

✅ This PR's description meets the template requirements!
Please wait for other CI results.

chenwhql
chenwhql previously approved these changes Dec 31, 2020
Copy link
Contributor

@chenwhql chenwhql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@chenwhql chenwhql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hbwx24 hbwx24 requested a review from chenwhql January 5, 2021 13:24
@chenwhql chenwhql merged commit f43e1d8 into PaddlePaddle:develop Jan 5, 2021
chenwhql pushed a commit to chenwhql/Paddle that referenced this pull request Jan 5, 2021
* Support storage of large parameters

* Reduce the complexity of the unittest

* Reduce the complexity of the unittest,commented out unittest for

* add unittest for static.save/load

* Increase the timeout threshold of 'test_static_save_load'

* Increase the timeout threshold of 'test_static_save_load'

* Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load'

* Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load'
hbwx24 added a commit to hbwx24/Paddle that referenced this pull request Jan 6, 2021
* Support storage of large parameters

* Reduce the complexity of the unittest

* Reduce the complexity of the unittest,commented out unittest for

* add unittest for static.save/load

* Increase the timeout threshold of 'test_static_save_load'

* Increase the timeout threshold of 'test_static_save_load'

* Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load'

* Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load'
lanxianghit pushed a commit that referenced this pull request Jan 7, 2021
…0170)

* Support storage of large parameters (#29988)

* Support storage of large parameters

* Reduce the complexity of the unittest

* Reduce the complexity of the unittest,commented out unittest for

* add unittest for static.save/load

* Increase the timeout threshold of 'test_static_save_load'

* Increase the timeout threshold of 'test_static_save_load'

* Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load'

* Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load'

* Extend the timeout for the (#30151)
hbwx24 added a commit to hbwx24/Paddle that referenced this pull request Jan 17, 2021
chenwhql pushed a commit that referenced this pull request Jan 18, 2021
hbwx24 added a commit to hbwx24/Paddle that referenced this pull request Jan 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants