Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[scf-to-calyx] Mark cf operations illegal #5031

Open
asraa opened this issue Apr 13, 2023 · 19 comments
Open

[scf-to-calyx] Mark cf operations illegal #5031

asraa opened this issue Apr 13, 2023 · 19 comments
Labels
bug Something isn't working Calyx The Calyx dialect

Comments

@asraa
Copy link

asraa commented Apr 13, 2023

Hi! I just posted on the #circt channel on discord about a model I had that I tried to pass to CIRCT to lower to Calyx via -lower-scf-to-calyx. It seems like it takes a few hours at least, and sometime just crashes my machine - which is very strange, since the model is pretty simple with mostly arithmetic operations and 163 if statements that are represented with conditional branches (they were scf.if statements, but converted to cf with convert-scf-to-cf.

This is the model: https://gist.github.com/asraa/93f10e86abcc3fe386eec172cb918e5c

I'll be poking around the conversion code near the conditional branch handling, just in case, and if I see something suspicious I'll update this thread.

LogicalResult buildCFGControl(DenseSet<Block *> path,

Thank you!

@mikeurbach for CC, thank you for responding in the chat!

@mikeurbach mikeurbach added bug Something isn't working Calyx The Calyx dialect labels Apr 13, 2023
@rachitnigam
Copy link
Contributor

Hey @asraa, thanks for opening an issue! Could generate the Calyx program for this and export it as a file or a gist in the calyx native syntax? information on that: https://docs.calyxir.org/fud/circt.html#mlir-to-native-representation

My guess is that scf to calyx is lowering it naively and allocating a lot of resources for the design.

Also, are you using the native compiler to generate Verilog or CIRCT passes? The Circt flow for calyx does not have many optimizations.

@mikeurbach
Copy link
Contributor

mikeurbach commented Apr 13, 2023

I think the issue here is the CIRCT scf to calyx pass doesn't complete or takes until the heat death of the universe--the resources being exhausted are on the host running the CIRCT compiler 😬 .

@rachitnigam
Copy link
Contributor

Ahh, I thought the problem was the FPGA resource limit was reached when compiling the design from Verilog.

I think the issue here is the CIRCT scf to calyx pass doesn't complete or takes until the heat death of the universe

I guess it would be interesting to figure out what part of the compilation pipeline is keeling over.

@matth2k
Copy link
Contributor

matth2k commented Apr 14, 2023

What happens if you run the model with the SCF operations still intact (by not running -convert-scf-to-cf)?

@mikeurbach
Copy link
Contributor

I took a very quick look. A couple observations:

The suggestion from @matth2k is a good idea, and would probably work here since the control flow seems fairly straightforward (see below).

This recursion in handling cf.cond_br appears to be leading to exponential runtime. So if you stick to the scf dialect, which this pass should be able to consume, it can be avoided.

bool trueBrSchedSuccess =
schedulePath(rewriter, path, brOp.getLoc(), block, successors[0],
thenSeqOp.getBodyBlock())
.succeeded();
bool falseBrSchedSuccess = true;
if (trueBrSchedSuccess) {
falseBrSchedSuccess =
schedulePath(rewriter, path, brOp.getLoc(), block, successors[1],
elseSeqOp.getBodyBlock())
.succeeded();
}

We should fix the above regardless.

I also noticed if you run the -canonicalize pass on the example input, all of the cf.cond_br ops canonicalize to arith.select, which is neat. Unfortunately -lower-scf-to-calyx doesn't handle arith.select. This is may not be obvious from the error message, but if you try it, you will see this: failed to legalize operation 'arith.select' that was explicitly marked illegal. It might be good to add support arith.select, since it's more "mux like" and less "control flow like", which generally has made it easier to handle for HLS-y passes.

@asraa
Copy link
Author

asraa commented Apr 14, 2023

What happens if you run the model with the SCF operations still intact (by not running -convert-scf-to-cf)?

scf.if is unsupported in the pass, because scf.yield statements are expected to originate from scf.while loops - which is why I had to run that. See this issue: #4843

Unfortunately -lower-scf-to-calyx doesn't handle arith.select.

Neat! Actually this is what the operations were originally, before I rewrote them to use scf.if because of that error. So it seems like that's the best way. Let's see what I can do...

@asraa
Copy link
Author

asraa commented Apr 14, 2023

EDIT: Actually ignore me, I see that it's pretty obviously needed to iterate through the common return statements.

@mikeurbach is the reason it's causing exponential behavior because both the true branch and false branch are iterating through their successor blocks, which they may have many in common of (e.g. the statements after return)?

If so, how come tracking path doesn't detect this behavior? I'd expect if a successor was added to the path during the if branch, it would end up failing on this line during the else branch (although it doesn't seem to)

if (path.count(block) != 0)

It does seem to process the common successors, although their paths aren't shared.

@mikeurbach
Copy link
Contributor

I will try to take a deeper look and figure out what's going on, and if there's a better way to rewrite it. @mortbopet do you have any recollection on this? I see some comments about naively elaborating the CFG, which mention a simplification pass that doesn't seem to exist. Any previous thoughts you might be able to page back in on how to implement this more efficiently? If not, I might have some time to re-implement this.

@asraa I think adding scf.if might be the best way to support your use case with Calyx, since in general we like using more high-level control flow information, and you're not the first one to hit this limitation. We could also support arith.select. Are you specifically interested in Calyx? If you just wanted to try some HLS flow with CIRCT, we have the dynamically scheduled Handshake flow, which can consume this input. You'll need to remove the tensorflow attributes, which we don't use anyway, and then you can do: hlstool hello_world.std.mlir.

@asraa
Copy link
Author

asraa commented Apr 14, 2023

Are you specifically interested in Calyx? If you just wanted to try some HLS flow with CIRCT, we have the dynamically scheduled Handshake flow, which can consume this input. You'll need to remove the tensorflow attributes, which we don't use anyway, and then you can do: hlstool hello_world.std.mlir.

Thanks for the pointer to handshake! We are interested in exporting verilog so we can use this for an existing optimizer we have. I was able to lower it to handshake by removing the TF attributes like you say, and then lower to HW, but ended up with a crash when exporting to verilog. The crash here is a little opaque to me... https://gist.github.com/asraa/a9fb964e51c835f2c2904d3a007b29ff

In the meantime I think I'm confident enough-ish to add some support for scf.if - took me a little while to play around with Calyx!

@mikeurbach
Copy link
Contributor

I was able to lower it to handshake by removing the TF attributes like you say, and then lower to HW, but ended up with a crash when exporting to verilog.

There are quite a few passes to go through the Handshake flow successfully, and they are all wrapped up in the hlstool I mentioned. Can you try compiling the hlstool binary and trying the command I mentioned above? This will run lots of optimizations, as well as required transformations.

In the meantime I think I'm confident enough-ish to add some support for scf.if

That would be great! If you take a stab at it, let us know if you have any questions.

@mortbopet
Copy link
Contributor

@mortbopet do you have any recollection on this? I see some comments about naively elaborating the CFG, which mention a simplification pass that doesn't seem to exist.

Not much - but it does seem like the possible culprit (some profiling here would probably clear the fog a bit). If i'd rewrite this now, i'd probably do it in the style that CalyxToFSM does it - i.e. a very functional style. Wherein instead of the "pre" block always being passed around, we pass the (possibly predeclared/stand-in) "next" block around, and specialize lowering on branching, such that each branch is handled independently, and then control is transferred to the lowering of the "next" block afterward:

if (failed(dispatch(branchStateOp, innerBranchOp, nextState)))

@mikeurbach
Copy link
Contributor

Yeah I did a quick profile and there were stacks with hundreds of alternating buildCFGControl and schedulePath in this test case. Thanks for the pointer, that makes sense.

@asraa
Copy link
Author

asraa commented May 3, 2023

Hey! I'm just back from an extended vacation and have been looking at this again. On one hand I've built hlstool and while that's building I've been familiarizing and drafting support for simple scf.if's in the Calyx lowering.

If you just wanted to try some HLS flow with CIRCT, we have the dynamically scheduled Handshake flow, which can consume this input. You'll need to remove the tensorflow attributes, which we don't use anyway, and then you can do: hlstool hello_world.std.mlir.

I think some of the docs are outdated here, but I've tried (with a handshake model as well):

$ hlstool --mlir_kernel --kernel_name main --kernel_file hello_world.handshake.mlir // --hls_kernel from the docs is gone
$ hlstool --kernel_file  hello_world.std.mlir --mlir_kernel --kernel_name main

Both create the hls checkpoint file, but do not print out any useful error:

====================== Parsing and validating arguments ======================
INFO:     using kernel file: hello_world.handshake.mlir
INFO:     using kernel name: main
INFO:     using testbench file: None
INFO:     Stored checkpoint to .hlstool_checkpoint. You can rerun HLSTool in this directory and pass the --checkpoint flag instead of providing paths and kernel names.
usage: hlstool [-h] [--silent] [--kernel_file KERNEL_FILE] [--tb_file TB_FILE] [--tb_entry TB_ENTRY]
               [--kernel_name KERNEL_NAME] [--outdir OUTDIR] [--vcd VCD] [--no_trace]
               [--vlt_threads VLT_THREADS] [--rebuild] [--synth] [--cosim] [--checkpoint]
               [--mlir_kernel] [--extra_polygeist_tb_args EXTRA_POLYGEIST_TB_ARGS]
               [--extra_polygeist_kernel_args EXTRA_POLYGEIST_KERNEL_ARGS]
               {dynamic-polygeist,static-polygeist,dynamic-torch,static-torch} ...

HLSTool is a tool for driving an end-to-end MLIR based HLS flow.

Usage:
Arguments are provided at two levels. First, generic arguments are specified, which
applies to any mode. The second level are mode specific arguments.
Generic arguments are specified before the (positional) mode argument.
To see the arguments for a specific mode, use

   'hlstool {mode} --help'

Any pointers? It seems like hlstool can get these MLIR inputs down to Verilog which is the goal!

Regarding scf.if support, I think I'm sort of familiar with the registers, groups, and sequences I need to create - but a little stalled on code. I'll reach out for some specific questions too.

@mikeurbach
Copy link
Contributor

Hmm looking at your output, it seems like you're using hlstool from circt-hls. Sorry for not being clear, what I meant was the hlstool that is not included in CIRCT here 🤦 . That other hlstool from circt-hls hasn't been maintained in a while, and the one bundled in CIRCT accomplishes similar goals (at least for the simple case of going to Verilog). So within the circt directory, you should be able to build the hlstool I intended with something like ninja hlstool, and it will be in your build/bin directory.

With that tool, the example I showed before of just hlstool hello_world.std.mlir should output Verilog.

@rachitnigam rachitnigam changed the title [SCFToCalyx] calyx: resource exhaustion with fairly simple testcase [scf-to-calyx] Exponential behavior when compiling cf.cond_br May 9, 2023
@rachitnigam
Copy link
Contributor

Renamed the issue to more appropriately point to the problem. Hopefully that's okay!

@asraa
Copy link
Author

asraa commented May 9, 2023

Thank you everyone!

With that tool, the example I showed before of just hlstool hello_world.std.mlir should output Verilog.

Yes!! It was successful, thank you so much for the help - this brought us closer to a proof of concept that we're working on :)

@rachitnigam
Copy link
Contributor

@mikeurbach would it make sense to mark cf operators as illegal once we support support scf.yield? It's not clear to me that we can actually ever support true cf programs in the scf-to-calyx pass because Calyx itself uses a structured control flow representation. A more natural requirement might be converting cf into scf before running the rest of the pass pipeline.

Does this seem reasonable or am I missing something?

@mortbopet
Copy link
Contributor

mortbopet commented Aug 22, 2023

@rachitnigam that makes sense to me If structured control flow is the only sensible thing, then let's enforce that. Given that https://reviews.llvm.org/D156889 has now merged upstream (thank you @Dinistro for making me aware), there's no excuse to not have this restriction.

@rachitnigam
Copy link
Contributor

Awesome! I've renamed the issue to reflect this. We can add cf as a "high-level dialect" for the Calyx flow in hlstool (it is already used for handshake). Of course, a lot of programs will not work because of the absence of scf.yield but that is the next item for me to work on!

@rachitnigam rachitnigam changed the title [scf-to-calyx] Exponential behavior when compiling cf.cond_br [scf-to-calyx] Mark cf operations illegal Aug 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Calyx The Calyx dialect
Projects
None yet
Development

No branches or pull requests

5 participants