Optimize class ExceptionStateSet to improve both runtime and memory usage. #171
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
motivation && reason
Under the original ExceptionStateSet architecture, we observed that during the process of exception propagation, the STA would attempt to create an ExceptionStateSet for each specific Vertex and Tag, even if it was entirely unrelated to that point and could have been completely copied from the ExceptionStateSet of the previous stage. Like following:
This meant that a large number of identical ExceptionStateSet instances were created, even though their contents were exactly the same, leading to unnecessary memory overhead.
Furthermore, under this architecture, every time a Tag Match was performed, the STA needed to repeatedly compare the elements within the ExceptionStateSet. Additionally, when calculating the Tag Hash, it had to repeatedly iterate over and compute the hash values of each Exception in the ExceptionStateSet. This resulted in significant additional runtime overhead.
action && improvement
We refactored the ExceptionStateSet structure and observed over a 5% runtime improvement through testing with the pprof tool on our test design, along with minor memory optimizations. The changes we made include the following:
pprof data && evidence
Before this pull request:
before_this_pr.pdf
After this pull request:
after_this_pr.pdf
You can see the sample time percentage of function "findTag" improved from 12.3% to 5.9%.