Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@mikedn
Copy link

@mikedn mikedn commented Nov 9, 2017

This improves the memory usage (and speed) of the DF/IDF computation performed by SsaBuilder. The existing implementation relies too much on expensive hashtables when cheaper data structures (vectors) could be used instead.

SSA memory usage decreases by 29% and there's also a 0.24% drop in instructions retired:

PIN data: https://1drv.ms/x/s!Av4baJYSo5pjgrsEI2JcsoUdj3Bh8g

JitMemStats diff: https://gist.github.com/mikedn/a7fad8a509e414d6d9c9101e46818f62

No jit diffs.

@BruceForstall
Copy link

@dotnet/jit-contrib

// add 1.
m_pDomPreOrder = jitstd::utility::allocate<int>(m_allocator, bbArrSize);
m_pDomPostOrder = jitstd::utility::allocate<int>(m_allocator, bbArrSize);
m_pDomPreOrder = new (&m_allocator) int[bbArrSize];
Copy link

@rartemev rartemev Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we introduce Allocator* GetAllocator() in order to avoid using & everywhere and make code a bit more readable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe. Or new operator overloads that take CompAllocator& instead of CompAllocator*. But note that this comes from a commit in another PR - #14953, I only added here so I can work until that PR is merged.

Copy link
Author

@mikedn mikedn Nov 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That other PR has been merged but I plan to clean up this and some other CompAllocator related issues in a future PR.

@mikedn mikedn force-pushed the ssa-mem-dom branch 2 times, most recently from e290108 to e25cb34 Compare November 12, 2017 11:28
@mikedn
Copy link
Author

mikedn commented Nov 12, 2017

The initial version of this PR also optimized dominator trees (they too make unnecessary use of hashtables) but I left that change out.

Changing hashtables to vectors changes the block visiting order and copy propagation is somehow affected by that and some differences appear in the generated code. Differences look correct (they should be, after all the original hastable order isn't somehow special) but I'd like to understand why the order matters in copy propagation.

@mikedn
Copy link
Author

mikedn commented Nov 12, 2017

There may be still room for improvement:

  • It's not clear if using a hashtable to map blocks to their DF/IDF sets is the best solution. The premise is that not all blocks will have a non-empty DF/IDF and the hashtable is a sparse data structure (sort of). But some quick instrumentation of this code indicates that half of the blocks end up having a non-empty DF/IDF, that may mean that using a vector indexed by bbNum could be a better choice.
  • Using vectors for DF/IDF is easy but may also waste some memory due to vector resizing. A single linked list might be better, especially if nodes are pooled so they can be reused.

However, for now I'd rather fix other obvious inefficiencies (e.g SsaRenameState is another memory hog due to its use of jitstd::list for stacks) rather than put a lot of effort into measuring and then come back possibly empty handed.

Makes the code more readable and avoids the duplicated IsShort test that a separate IsMember/AddElemD may generate.
DF(b) can be stored in a vector instead of a hashtable (set). Nothing needs a O(1) membership test, the duplicates that may be generated during DF construction can be easily avoided by observing that:
* each block in the graph is processed exactly once
* during processing the block is added to the relevant DF vectors, at the end of each vector. If the same DF vector is encountered multiple times then checking if the block is already a member is trivial.
Like DF(b), IDF(b) can be stored in a vector if duplicates are avoided. This can be done by using a BitVector like TopologicalSort and ComputeImmediateDom already do. It's cheaper than using a hashtable (even though it requires a "clear" for every block that has a frontier).

Also, storing IDF(b) into a vector makes it easier to track newly added blocks - they're at the end of the vector. Using a separate "delta" set is no longer needed.
@mikedn mikedn changed the title [WIP] Improve SSA dominator computation memory usage Improve SSA dominator computation memory usage Nov 14, 2017
@mikedn
Copy link
Author

mikedn commented Nov 14, 2017

Tizen armel failed due to some docker repository connectivity issues: https://ci.dot.net/job/dotnet_coreclr/job/master/job/armel_cross_checked_tizen_prtest/771/consoleFull#-8536224876a086b3e-df04-41d2-bc4d-43e8f9406d07

@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test

Copy link

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work!

@BruceForstall BruceForstall merged commit a7cf54f into dotnet:master Nov 15, 2017
@mikedn mikedn deleted the ssa-mem-dom branch December 16, 2017 09:15
@mikedn mikedn mentioned this pull request Feb 1, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants