[LLHD] Improve Deseq pass performance #8479

fabianschuiki · 2025-05-13T00:08:33Z

The current implementation of the LLHD Deseq pass scales very poorly as the number of SSA values in a process increases. For one, the pass uses a crufty data flow analysis to eagerly analyse how the triggers of a process influence drive conditions and drive values. Furthermore, the pass also eagerly tracks every computation done on i1 values in its canonical form.

While trying the Deseq pass on a few Verilog inputs, I noticed that even a fairly benign multi-port register file would cause the pass to grind to a halt. This commit adds that register file as a test case to deseq.mlir. The register file uses a long chain of control flow operations, which causes the DNFs in the Deseq pass to grow beyond 25 distinct input terms, requiring MBs of memory during brute-force DNF minimization steps.

This commit makes the Deseq pass a lot more conservative. It no longer tracks operations on the process triggers as a boolean expression in disjunctive normal form with an unbounded number of terms. Instead, the pass now uses a truth table to capture these expressions, and limits the truth table to tracking the process triggers plus an opaque "unknown" term to represent other logic terms. This causes all operations that do not affect the triggers to be effectively absorbed in that "unknown" term. The resulting truth table has at most 5 terms -- past and present value of each trigger, plus the unknown term -- which requires no more than 32 bits to represent as an APInt.

The value tables are also made more efficient. These are used to track the different values and the conditions under which they may be driven to a signal. The conditions are now tracked as compact truth tables as well, and the value itself may be an "unknown" value in case two distinct values may be driven onto a signal without the choice depending on a process trigger.

Furthermore, this commit replaces the original iterative data flow analysis to compute the DNFs and value tables with a depth-first traversal through the IR. This DFS computes the truth tables and value tables as they are needed, which is significantly less work than gratuitously computing them upfront.

As a result, this commit removes a significant scaling hurdle from the Deseq pass and makes it work a lot quicker on larger inputs. It is more conservative now since the pass no longer tracks drive values and drive conditions precisely, but only to the degree necessary to detect the clocking and reset scheme. This is sufficient in practice, though.

Note that the pass currently uses recursion for its DFS. We may want to change this in the future. Converting the recursion to a worklist-based approach is doable but extremely annoying to implement.

maerhart

LGTM

maerhart · 2025-05-13T09:45:35Z

lib/Dialect/LLHD/Transforms/Deseq.cpp

@@ -22,6 +22,7 @@
 #include "llvm/Support/GenericIteratedDominanceFrontier.h"

 #define DEBUG_TYPE "llhd-deseq"
+// #define DESEQ_DETAILED_TRACE


Maybe there is a way to define two debug strings "llhd-deseq" and "llhd-deseq-verbose" using DEBUG_WITH_TYPE directly and adding a helper macro that calls that macro for both if the more detailed level is specified. That way it's not necessary to edit and recompile to switch between the verbosity levels. Not blocking.

Fantastic idea. Added that 👍

lib/Dialect/LLHD/Transforms/DeseqUtils.h

lib/Dialect/LLHD/Transforms/Deseq.cpp

The current implementation of the LLHD Deseq pass scales very poorly as the number of SSA values in a process increases. For one, the pass uses a crufty data flow analysis to eagerly analyse how the triggers of a process influence drive conditions and drive values. Furthermore, the pass also eagerly tracks *every* computation done on i1 values in its canonical form. While trying the Deseq pass on a few Verilog inputs, I noticed that even a fairly benign multi-port register file would cause the pass to grind to a halt. This commit adds that register file as a test case to `deseq.mlir`. The register file uses a long chain of control flow operations, which causes the DNFs in the Deseq pass to grow beyond 25 distinct input terms, requiring MBs of memory during brute-force DNF minimization steps. This commit makes the Deseq pass a lot more conservative. It no longer tracks operations on the process triggers as a boolean expression in disjunctive normal form with an unbounded number of terms. Instead, the pass now uses a truth table to capture these expressions, and limits the truth table to tracking the process triggers plus an opaque "unknown" term to represent other logic terms. This causes all operations that do not affect the triggers to be effectively absorbed in that "unknown" term. The resulting truth table has at most 5 terms -- past and present value of each trigger, plus the unknown term -- which requires no more than 32 bits to represent as an APInt. The value tables are also made more efficient. These are used to track the different values and the conditions under which they may be driven to a signal. The conditions are now tracked as compact truth tables as well, and the value itself may be an "unknown" value in case two distinct values may be driven onto a signal without the choice depending on a process trigger. Furthermore, this commit replaces the original iterative data flow analysis to compute the DNFs and value tables with a depth-first traversal through the IR. This DFS computes the truth tables and value tables as they are needed, which is significantly less work than gratuitously computing them upfront. As a result, this commit removes a significant scaling hurdle from the Deseq pass and makes it work a lot quicker on larger inputs. It is more conservative now since the pass no longer tracks drive values and drive conditions precisely, but only to the degree necessary to detect the clocking and reset scheme. This is sufficient in practice, though. Note that the pass currently uses recursion for its DFS. We may want to change this in the future. Converting the recursion to a worklist-based approach is doable but extremely annoying to implement.

Co-authored-by: Martin Erhart <martin.erhart@sifive.com>

fabianschuiki requested review from seldridge, prithayan and uenoku May 13, 2025 00:08

fabianschuiki added the LLHD label May 13, 2025

fabianschuiki requested a review from maerhart as a code owner May 13, 2025 00:08

maerhart approved these changes May 29, 2025

View reviewed changes

fabianschuiki force-pushed the fschuiki/deseq-perf branch from ecb4b23 to 9c49ef6 Compare May 29, 2025 21:48

Apply suggestions from code review

375413e

Co-authored-by: Martin Erhart <martin.erhart@sifive.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LLHD] Improve Deseq pass performance #8479

[LLHD] Improve Deseq pass performance #8479

Uh oh!

fabianschuiki commented May 13, 2025

Uh oh!

maerhart left a comment

Uh oh!

maerhart May 13, 2025

Uh oh!

fabianschuiki May 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[LLHD] Improve Deseq pass performance #8479

Are you sure you want to change the base?

[LLHD] Improve Deseq pass performance #8479

Uh oh!

Conversation

fabianschuiki commented May 13, 2025

Uh oh!

maerhart left a comment

Choose a reason for hiding this comment

Uh oh!

maerhart May 13, 2025

Choose a reason for hiding this comment

Uh oh!

fabianschuiki May 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!