This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This memory error has been discussed in the dev mailing list and this error can be reproduced with the following commands.
This is a temporary fix. The error is caused by a race condition that the MKLDNN memory in an output NDArray is removed when some MKLDNN operator tries to read the MKLDNN memory from its input arrays.
However, the race condition shouldn't happen. The execution engine schedules the execution of computation based on the data dependency. When an operator is scheduled to write data to an output NDArray, any operator that reads data from the NDArray shouldn't be scheduled for execution. But we actually observe that the input array of an operator is modified when the operator is running, which suggests that the race condition can mess up data in the input NDArray even without MKLDNN.
So we need a fundamental fix for this bug.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.