Skip to content

[LLHD] Improve Deseq pass performance #8479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

fabianschuiki
Copy link
Contributor

The current implementation of the LLHD Deseq pass scales very poorly as the number of SSA values in a process increases. For one, the pass uses a crufty data flow analysis to eagerly analyse how the triggers of a process influence drive conditions and drive values. Furthermore, the pass also eagerly tracks every computation done on i1 values in its canonical form.

While trying the Deseq pass on a few Verilog inputs, I noticed that even a fairly benign multi-port register file would cause the pass to grind to a halt. This commit adds that register file as a test case to deseq.mlir. The register file uses a long chain of control flow operations, which causes the DNFs in the Deseq pass to grow beyond 25 distinct input terms, requiring MBs of memory during brute-force DNF minimization steps.

This commit makes the Deseq pass a lot more conservative. It no longer tracks operations on the process triggers as a boolean expression in disjunctive normal form with an unbounded number of terms. Instead, the pass now uses a truth table to capture these expressions, and limits the truth table to tracking the process triggers plus an opaque "unknown" term to represent other logic terms. This causes all operations that do not affect the triggers to be effectively absorbed in that "unknown" term. The resulting truth table has at most 5 terms -- past and present value of each trigger, plus the unknown term -- which requires no more than 32 bits to represent as an APInt.

The value tables are also made more efficient. These are used to track the different values and the conditions under which they may be driven to a signal. The conditions are now tracked as compact truth tables as well, and the value itself may be an "unknown" value in case two distinct values may be driven onto a signal without the choice depending on a process trigger.

Furthermore, this commit replaces the original iterative data flow analysis to compute the DNFs and value tables with a depth-first traversal through the IR. This DFS computes the truth tables and value tables as they are needed, which is significantly less work than gratuitously computing them upfront.

As a result, this commit removes a significant scaling hurdle from the Deseq pass and makes it work a lot quicker on larger inputs. It is more conservative now since the pass no longer tracks drive values and drive conditions precisely, but only to the degree necessary to detect the clocking and reset scheme. This is sufficient in practice, though.

Note that the pass currently uses recursion for its DFS. We may want to change this in the future. Converting the recursion to a worklist-based approach is doable but extremely annoying to implement.

Copy link
Member

@maerhart maerhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -22,6 +22,7 @@
#include "llvm/Support/GenericIteratedDominanceFrontier.h"

#define DEBUG_TYPE "llhd-deseq"
// #define DESEQ_DETAILED_TRACE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there is a way to define two debug strings "llhd-deseq" and "llhd-deseq-verbose" using DEBUG_WITH_TYPE directly and adding a helper macro that calls that macro for both if the more detailed level is specified. That way it's not necessary to edit and recompile to switch between the verbosity levels. Not blocking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic idea. Added that 👍

The current implementation of the LLHD Deseq pass scales very poorly as
the number of SSA values in a process increases. For one, the pass uses
a crufty data flow analysis to eagerly analyse how the triggers of a
process influence drive conditions and drive values. Furthermore, the
pass also eagerly tracks *every* computation done on i1 values in its
canonical form.

While trying the Deseq pass on a few Verilog inputs, I noticed that even
a fairly benign multi-port register file would cause the pass to grind
to a halt. This commit adds that register file as a test case to
`deseq.mlir`. The register file uses a long chain of control flow
operations, which causes the DNFs in the Deseq pass to grow beyond 25
distinct input terms, requiring MBs of memory during brute-force DNF
minimization steps.

This commit makes the Deseq pass a lot more conservative. It no longer
tracks operations on the process triggers as a boolean expression in
disjunctive normal form with an unbounded number of terms. Instead, the
pass now uses a truth table to capture these expressions, and limits the
truth table to tracking the process triggers plus an opaque "unknown"
term to represent other logic terms. This causes all operations that do
not affect the triggers to be effectively absorbed in that "unknown"
term. The resulting truth table has at most 5 terms -- past and present
value of each trigger, plus the unknown term -- which requires no more
than 32 bits to represent as an APInt.

The value tables are also made more efficient. These are used to track
the different values and the conditions under which they may be driven
to a signal. The conditions are now tracked as compact truth tables as
well, and the value itself may be an "unknown" value in case two
distinct values may be driven onto a signal without the choice depending
on a process trigger.

Furthermore, this commit replaces the original iterative data flow
analysis to compute the DNFs and value tables with a depth-first
traversal through the IR. This DFS computes the truth tables and value
tables as they are needed, which is significantly less work than
gratuitously computing them upfront.

As a result, this commit removes a significant scaling hurdle from the
Deseq pass and makes it work a lot quicker on larger inputs. It is more
conservative now since the pass no longer tracks drive values and drive
conditions precisely, but only to the degree necessary to detect the
clocking and reset scheme. This is sufficient in practice, though.

Note that the pass currently uses recursion for its DFS. We may want to
change this in the future. Converting the recursion to a worklist-based
approach is doable but extremely annoying to implement.
@fabianschuiki fabianschuiki force-pushed the fschuiki/deseq-perf branch from ecb4b23 to 9c49ef6 Compare May 29, 2025 21:48
Co-authored-by: Martin Erhart <martin.erhart@sifive.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants