paddle.save/load ,paddle.static.save/load 保存大文件的bug #30170

hbwx24 · 2021-01-06T13:01:11Z

PR types

Bug fixes

PR changes

APIs

Describe

paddle.save/load ,paddle.static.save/load 保存大文件的bug
原始PR：#29988
#30151

* Support storage of large parameters * Reduce the complexity of the unittest * Reduce the complexity of the unittest,commented out unittest for * add unittest for static.save/load * Increase the timeout threshold of 'test_static_save_load' * Increase the timeout threshold of 'test_static_save_load' * Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load' * Increase the timeout threshold of 'test_static_save_load' and 'test_paddle_save_load'

paddle-bot-old · 2021-01-06T13:01:18Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

chenwhql

LGTM

ZHUI · 2021-01-07T06:36:59Z

我这边测试：

模型的static_dict使用 paddle.save 下来，大概有3.3G左右，实际的参数量，计算出来大概2.2G左右。使用 numpy 存成 npz，可以做到，2.2G。而且 paddle.load 加载save下的参数，也比numpy慢很多

猜测是 pickle序列化dump的缘故。能否优化？

chenwhql · 2021-01-07T07:20:00Z

我这边测试：

模型的static_dict使用 paddle.save 下来，大概有3.3G左右，实际的参数量，计算出来大概2.2G左右。使用 numpy 存成 npz，可以做到，2.2G。而且 paddle.load 加载save下的参数，也比numpy慢很多

猜测是 pickle序列化dump的缘故。能否优化？

现阶段不具备这个条件：

首先dump有存储限制这个问题是最近才暴露出来的，这个是通过拆分大参数存为多个小参数，去解决这个盲点问题，拆分本身也是有开销，慢可能和这个也相关
npz是numpy.savez用来存numpy array的接口，如何适配我们要存dict的需求的？简单来看npz并不适用于我们的场景
现阶段不能做不兼容的修改，重点是要确保2.0在save上不能有功能上的死角，超大参数不是高频场景，预计影响可控
这说明我们一开始动态图save采用dump去存储这个设计就是欠考虑的，且不追根渊源，目前看来这导致我们后面势必要对save的格式进行不兼容的升级了，但我理解2.0已经来不及了

ZHUI · 2021-01-07T12:47:40Z

我这边测试：

模型的static_dict使用 paddle.save 下来，大概有3.3G左右，实际的参数量，计算出来大概2.2G左右。使用 numpy 存成 npz，可以做到，2.2G。而且 paddle.load 加载save下的参数，也比numpy慢很多

猜测是 pickle序列化dump的缘故。能否优化？

现阶段不具备这个条件：

首先dump有存储限制这个问题是最近才暴露出来的，这个是通过拆分大参数存为多个小参数，去解决这个盲点问题，拆分本身也是有开销，慢可能和这个也相关

npz是numpy.savez用来存numpy array的接口，如何适配我们要存dict的需求的？简单来看npz并不适用于我们的场景

现阶段不能做不兼容的修改，重点是要确保2.0在save上不能有功能上的死角，超大参数不是高频场景，预计影响可控

这说明我们一开始动态图save采用dump去存储这个设计就是欠考虑的，且不追根渊源，目前看来这导致我们后面势必要对save的格式进行不兼容的升级了，但我理解2.0已经来不及了

好的，谢谢，清楚了。
我们目前存的参数是转成numpy的，设计升级save的格式，可参考numpy的save、savez。

chenwhql · 2021-01-07T12:54:26Z

我这边测试：

模型的static_dict使用 paddle.save 下来，大概有3.3G左右，实际的参数量，计算出来大概2.2G左右。使用 numpy 存成 npz，可以做到，2.2G。而且 paddle.load 加载save下的参数，也比numpy慢很多

猜测是 pickle序列化dump的缘故。能否优化？

现阶段不具备这个条件：

首先dump有存储限制这个问题是最近才暴露出来的，这个是通过拆分大参数存为多个小参数，去解决这个盲点问题，拆分本身也是有开销，慢可能和这个也相关

npz是numpy.savez用来存numpy array的接口，如何适配我们要存dict的需求的？简单来看npz并不适用于我们的场景

现阶段不能做不兼容的修改，重点是要确保2.0在save上不能有功能上的死角，超大参数不是高频场景，预计影响可控

这说明我们一开始动态图save采用dump去存储这个设计就是欠考虑的，且不追根渊源，目前看来这导致我们后面势必要对save的格式进行不兼容的升级了，但我理解2.0已经来不及了

好的，谢谢，清楚了。
我们目前存的参数是转成numpy的，设计升级save的格式，可参考numpy的save、savez。

好的，感谢建议

hbwx24 added 2 commits January 6, 2021 12:57

Extend the timeout for the (PaddlePaddle#30151)

8c6133b

XieYunshen approved these changes Jan 7, 2021

View reviewed changes

hbwx24 closed this Jan 7, 2021

hbwx24 reopened this Jan 7, 2021

chenwhql approved these changes Jan 7, 2021

View reviewed changes

lanxianghit merged commit bfb6f61 into PaddlePaddle:release/2.0 Jan 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

paddle.save/load ,paddle.static.save/load 保存大文件的bug #30170

paddle.save/load ,paddle.static.save/load 保存大文件的bug #30170

Uh oh!

hbwx24 commented Jan 6, 2021 •

edited by chenwhql

Loading

Uh oh!

paddle-bot-old bot commented Jan 6, 2021

Uh oh!

chenwhql left a comment

Uh oh!

ZHUI commented Jan 7, 2021

Uh oh!

chenwhql commented Jan 7, 2021 •

edited

Loading

Uh oh!

ZHUI commented Jan 7, 2021

Uh oh!

chenwhql commented Jan 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

paddle.save/load ,paddle.static.save/load 保存大文件的bug #30170

paddle.save/load ,paddle.static.save/load 保存大文件的bug #30170

Uh oh!

Conversation

hbwx24 commented Jan 6, 2021 • edited by chenwhql Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Describe

Uh oh!

paddle-bot-old bot commented Jan 6, 2021

Uh oh!

chenwhql left a comment

Choose a reason for hiding this comment

Uh oh!

ZHUI commented Jan 7, 2021

Uh oh!

chenwhql commented Jan 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZHUI commented Jan 7, 2021

Uh oh!

chenwhql commented Jan 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hbwx24 commented Jan 6, 2021 •

edited by chenwhql

Loading

chenwhql commented Jan 7, 2021 •

edited

Loading