-
Couldn't load subscription status.
- Fork 2.6k
Improve SSA dominator computation memory usage #14965
Conversation
|
@dotnet/jit-contrib |
| // add 1. | ||
| m_pDomPreOrder = jitstd::utility::allocate<int>(m_allocator, bbArrSize); | ||
| m_pDomPostOrder = jitstd::utility::allocate<int>(m_allocator, bbArrSize); | ||
| m_pDomPreOrder = new (&m_allocator) int[bbArrSize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we introduce Allocator* GetAllocator() in order to avoid using & everywhere and make code a bit more readable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe. Or new operator overloads that take CompAllocator& instead of CompAllocator*. But note that this comes from a commit in another PR - #14953, I only added here so I can work until that PR is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That other PR has been merged but I plan to clean up this and some other CompAllocator related issues in a future PR.
e290108 to
e25cb34
Compare
|
The initial version of this PR also optimized dominator trees (they too make unnecessary use of hashtables) but I left that change out. Changing hashtables to vectors changes the block visiting order and copy propagation is somehow affected by that and some differences appear in the generated code. Differences look correct (they should be, after all the original hastable order isn't somehow special) but I'd like to understand why the order matters in copy propagation. |
|
There may be still room for improvement:
However, for now I'd rather fix other obvious inefficiencies (e.g |
Makes the code more readable and avoids the duplicated IsShort test that a separate IsMember/AddElemD may generate.
DF(b) can be stored in a vector instead of a hashtable (set). Nothing needs a O(1) membership test, the duplicates that may be generated during DF construction can be easily avoided by observing that: * each block in the graph is processed exactly once * during processing the block is added to the relevant DF vectors, at the end of each vector. If the same DF vector is encountered multiple times then checking if the block is already a member is trivial.
Like DF(b), IDF(b) can be stored in a vector if duplicates are avoided. This can be done by using a BitVector like TopologicalSort and ComputeImmediateDom already do. It's cheaper than using a hashtable (even though it requires a "clear" for every block that has a frontier). Also, storing IDF(b) into a vector makes it easier to track newly added blocks - they're at the end of the vector. Using a separate "delta" set is no longer needed.
|
Tizen armel failed due to some docker repository connectivity issues: https://ci.dot.net/job/dotnet_coreclr/job/master/job/armel_cross_checked_tizen_prtest/771/consoleFull#-8536224876a086b3e-df04-41d2-bc4d-43e8f9406d07 @dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great work!
This improves the memory usage (and speed) of the DF/IDF computation performed by
SsaBuilder. The existing implementation relies too much on expensive hashtables when cheaper data structures (vectors) could be used instead.SSA memory usage decreases by 29% and there's also a 0.24% drop in instructions retired:
PIN data: https://1drv.ms/x/s!Av4baJYSo5pjgrsEI2JcsoUdj3Bh8g
JitMemStats diff: https://gist.github.com/mikedn/a7fad8a509e414d6d9c9101e46818f62
No jit diffs.