-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core][cgraph] Fix eager release if destruction out of order #49781
Conversation
Signed-off-by: dayshah <dhyey2019@gmail.com>
self._dag._execute_until( | ||
self._execution_index, self._channel_index, timeout | ||
) | ||
return_vals = self._dag._get_execution_results( | ||
self._execution_index, self._channel_index | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we prefer using an extra method call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because we want to use execute_until for release buffer and we don't want to get
there because we don't want to remove result from buffer
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline:
Calling _execute_until
to consume all DAG references with a smaller execution_index
compared to the caller of the destruction may cause the memory usage increases in the driver process.
Instead, we will try to remember the execution_index
that should be released and release it only when all DAG references with smaller execution_index
values have been consumed.
4b85ea6
to
33c38fa
Compare
Signed-off-by: dayshah <dhyey2019@gmail.com>
Why are these changes needed?
The current implementation of the CompiledDagRef destructor will release the buffer for everything up to the execution index of the dagref that was destructed. This means that calling get on a previous dagref will fail. Now we're assuring that the previous execution for previous dagrefs complete and the results get cached before we release.
Because Python doesn't guarantee destruction order, this has the possible pitfall of requiring finishing execution of other previous dagrefs that are also about to be destructed, but the dagref with the higher execution_index just got del called on it first. Opened issue #49782 for this.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.