[FIRRTL] Dedup: hash modules back->front #7820

rwy7 · 2024-11-15T01:29:16Z

Hashing modules back->front lets us be a bit leaner with the state we
have to track, and hopefully will give us a nice speed improvement.

In the old algorithm, as we hashed each operation or block in the
module, we would store its position as an index into a side-table. When
a value is used, we could record the use by hashing-in the result-no and
the defining operation's index (or argument-no and block index, resp).

This index table is a major performance bottleneck for dedup: in a large
module, this table can be massive. The observation made is that values
tend to only be used near their definition. After we hash the last use
of an operation or block, we should be safe to remove its index from the
index table, and keep the index table as small as possible.

This PR modifies the hasher to walk the module backwards. When a value
is first encountered (while hashing a use/operand), we assign an ID to
the defining operation. We use that ID to hash all uses.

When the defining op is hashed, we hash its ID once more (recording the
fact that the ID is defined by the op), and remove the ID from the
table--a value can only be used after it is defined. This ensures that
we only track the ID of an operation for its live range in the IR.

The IDs are assigned according to their "first occurrence" in the
backwards walk of the IR. Since the assignment of IDs is derived from
the structure of the IR, two equivalent modules should assign the same
IDs to the same ops.

This PR also updates the hashing of inner-symbols to be handled in a
similar way. When an inner symbol is referenced, we assign an ID and
record the reference by hashing the ID. When an inner symbol is defined,
we record the definition by, again, hashing the ID. Unlike values, a
symbol can be referenced before it is defined, so we can never free
inner symbol IDs. This corrects an old logical "bug" in dedup, which
never arises in practice because chisel cannot generate it.

lib/Dialect/FIRRTL/Transforms/Dedup.cpp

youngar · 2024-11-15T19:51:20Z

This is not your problem, but I think when hashing the attr-dict for operations, we should hash in the name of the attribute for symbols and types. So move this code right below the check for non-essential attributes:

      // Hash the interned pointer.
      update(name.getAsOpaquePointer());

Hashing modules back->front lets us be a bit leaner with the state we have to track, and hopefully will give us a nice speed improvement. In the old algorithm, as we hashed each operation or block in the module, we would store its position as an index into a side-table. When a value is used, we could record the use by hashing-in the result-no and the defining operation's index (or argument-no and block index, resp). This index table is a major performance bottleneck for dedup: in a large module, this table can be massive. The observation made is that values tend to only be used near their definition. After we hash the last use of an operation or block, we should be safe to remove its index from the index table, and keep the index table as small as possible. This PR modifies the hasher to walk the module backwards. When a value is first encountered (while hashing a use/operand), we assign an ID to the defining operation. We use that ID to hash all future uses. When the defining op is hashed, we hash its ID once more (recording the fact that the ID is defined by the op), and remove the ID from the table--a value can only be used after it is defined. This ensures that we only track the ID of an operation for its live range in the IR. The IDs are assigned according to their "first occurrence" in the backwards walk of the IR. Since the assignment of IDs is derived from the structure of the IR, two equivalent modules should assign the same IDs to the same ops. This PR also updates the hashing of inner-symbols to be handled in a similar way. When an inner symbol is referenced, we assign an ID and record the reference by hashing the ID. When an inner symbol is defined, we record the definition by, again, hashing the ID. Unlike values, a symbol can be referenced before it is defined, so we can never free inner symbol IDs. This corrects an old logical "bug" in dedup, which never arises in practice because chisel cannot generate it.

rwy7 · 2024-11-15T20:35:02Z

This PR also adds in a "position counter" which is hashed in with each IR object. I think this is unnecessary, but it gives me a bit of extra confidence that two different IRs will push different data through the hasher. We could look at removing it in a followup PR but I think the overhead is negligible.

rwy7 force-pushed the fix-perf branch from 36280ef to 312f5c6 Compare November 15, 2024 01:37

rwy7 marked this pull request as ready for review November 15, 2024 13:53

rwy7 requested review from darthscsi and seldridge as code owners November 15, 2024 13:53

youngar changed the title ~~Dedup: hash modules back->front~~ [FIRRTL] Dedup: hash modules back->front Nov 15, 2024

youngar added the FIRRTL Involving the `firrtl` dialect label Nov 15, 2024

youngar reviewed Nov 15, 2024

View reviewed changes

lib/Dialect/FIRRTL/Transforms/Dedup.cpp Show resolved Hide resolved

youngar reviewed Nov 15, 2024

View reviewed changes

lib/Dialect/FIRRTL/Transforms/Dedup.cpp Show resolved Hide resolved

youngar approved these changes Nov 15, 2024

View reviewed changes

rwy7 force-pushed the fix-perf branch from 312f5c6 to a60aaee Compare November 15, 2024 20:16

rwy7 merged commit 2d93ce7 into llvm:main Nov 15, 2024
4 checks passed

rwy7 deleted the fix-perf branch November 15, 2024 20:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIRRTL] Dedup: hash modules back->front #7820

[FIRRTL] Dedup: hash modules back->front #7820

rwy7 commented Nov 15, 2024 •

edited

Loading

youngar commented Nov 15, 2024

rwy7 commented Nov 15, 2024

[FIRRTL] Dedup: hash modules back->front #7820

[FIRRTL] Dedup: hash modules back->front #7820

Conversation

rwy7 commented Nov 15, 2024 • edited Loading

youngar commented Nov 15, 2024

rwy7 commented Nov 15, 2024

rwy7 commented Nov 15, 2024 •

edited

Loading