-
Notifications
You must be signed in to change notification settings - Fork 5.7k
[Memory]More memory optimization policy #8690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -118,7 +118,7 @@ def _find_var(self, block_desc, var_name, is_forward): | |||
else: | |||
return block_desc.find_var_recursive(str(var_name)) | |||
|
|||
def memory_optimize(self): | |||
def memory_optimize(self, level=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code style says that a function name should be a verb-subject phrase, like, optimize_memory, instead of memory_optimize.
Also, it seems that we cannot optimize the memory; what we could is to optimize the usage of the memory.
For this case, does it mean reuse_memory
and should we rename level
into reuse_tensor_with_the_same_size
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. It's mainly reuse memory. And level 0 means that we can reuse tensor with the same size. Level 1 means that we can reuse tensor if current tensor size is the same or less than cache pool tensor.
I will refine these codes accordingly. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR give an aggressive policy to reuse the memory--do stream synchronize after each operator is launch, so we can delete all the variables if the op is not running.
@QiJune
We will merge this PR since the Image mission deadline is looming. Please give some experiment detail of the effect on speed and complete the issue description. Thanks!
After add a more optimized level, the image_classification demo memory reduced from 93024256 to 92807168. There is a little benefit.
There are still many die variables not be reused. Most of these are gradient variable. After sgd optimization, these gradient can be released. Maybe we have to delete them with a DeleteOperator.
I add another release memory policy with DeleteOp, and tested on resnet model:
Release memory policy has almost reached the upper limit(forward memory). If we want to reduce the memory occupation further, there are two ways: