-
Notifications
You must be signed in to change notification settings - Fork 18
Make DeadNodeEliminiation extensible #896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@haved @caleridas @sjalander Could you provide me with some feedback such that I know whether this goes in the right direction, please? |
That might be a necessity, but TBH the separate type of loop is really terrible to begin with. There is almost no point to it as everything hls_loop needs, Theta already does -- hls_loop is a "constrained subset" of normal theta capabilities, I would really like if users of hls_loop could look into formulating it this way. |
|
@caleridas I don't fully understand hls_loop being a subset of theta. The loop explicitly models the backedge with its own result and argument. So each loop variable has 1 input, 2 arguments, 2 results, and 1 output. This is such that in hardware one can select between the input and the backedge with a multiplexer. And between backedge and output. Is this possible to model with a version of the existing theta? |
Yes, in the following way: for each "logical" loop variable X, create two theta loop variables I and B.
so I effectively is a "loop invariant constant". I had already entertained the thought whether we want to explicitly model loop invariant variables separate from loop-carried inputs, but maybe that's a story for another time. The big primary "issue" is that the structure of loop input / loop carried / loop output variables conforming to the above must either be preserved by all transformation steps, or there must be some facility to restore this structure. Neither is conceptually really hard, but I admit that it is not terribly joyful work to iron out all corner cases. One conceptually good way to do it would be to treat such "special form" requirements as additional data to the ThetaNode, hooking this up correctly would in particularly allow to detect any constraint violation instantly at each transformation step. On the plus side, going this route probably allows to immediately leverage all loop transformations across backends. I am not opposed to parameterizing DNE in a way such that it can be consolidated today already as the above is non-negligible work. This is not mutually exclusive with reworking the loop node longer-term, but I think it is worth keeping this in mind going forward as an option at least. One additional comment: One logical evolution path for Theta is also to generically distinguish between "loop carried back-edge" and "output variables". That would I think also address part another part of what hls_loop does differently right now. In particular, it would allow to have an entirely distinct set of "back edge" vs. "output" values. The reason this would be useful outside hls is that this captures the exact signature of tail-recursive function calls. |
|
@caleridas The suggestions you bring up are things I have considered as well, but never quite come up with a nice way of representing. For example, it is typical for a Theta to have a bunch of results that are always I do not know how one would go about changing that, however, without doing large changes. If one were to introduce separate context variable, loop variable and exit variable, it would break all the symmetry between indices, which I am not sure it is worth? One final question I have is regarding the special |
I think it would be useful to discuss this in detail because it is more than just an aesthetic issue.
One important point in introducing abstraction such as "LoopVar" or "ContextVar" is to also reformulate the API in these terms and stop making assumptions on indices. For operational nodes, indices identify the "meaning" of an operand (e.g. divisor vs dividend), but for structural nodes that is a bit less of a thing.
I think I do not understand what you mean by "memory port routing different to traditional loop variables" -- my naive understanding would be that memory ports are states with ops capable of delivering different values each iteration. This is fine, and basically mirrors the behavior of "load". (That however implies another interesting question that I do not know the answer to -- does it mean that hls_loops implicitly also have an implicit relation to clock? that would be a bigger issue than any state questions). I think that in any case there is value if generic theta can represent what you need, and if it needs extending to do so I think that is still a good idea. That said, the approach taken in this PR to just move forward is perfectly reasonable. |
|
@haved @caleridas @sjalander This spiraled a little bit out of control. The discussion is less about the current approach, but whether it should be taken in the first place due to the HLS loop. This is definitely worth a discussion, but HLS loop won't go anywhere any time soon. Thus, unless any one of still has any objections to the approach, I would just move ahead and produce a proper PR. @caleridas already sanctioned the approach. Any objections from your side @haved and @sjalander ? |
|
@phate The following code example from the PR: Instead there would be a |
We currently have two DNE transformations in the code base: one for LLVM dialect and one for HLS dialect. The reason why we duplicated them in the first place was because the HLS dialect required to handle additional structural nodes (loop node) that was not handle in the LLVM DNE.
This PR introduces a way to extend DNE for custom structural nodes such that the two transformations can be unified and we can remove the duplicated code. This PR is meant as a POC and only shows the idea for the gamma node. For the other node, it would be possible to refactor it in a similar fashion. However, I would like to get some feedback to ensure that this is on the right track before I invest the time to convert the rest of the transformation.