Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIRRTL] Dedup: hash modules back->front #7820

Merged
merged 1 commit into from
Nov 15, 2024
Merged

[FIRRTL] Dedup: hash modules back->front #7820

merged 1 commit into from
Nov 15, 2024

Conversation

rwy7
Copy link
Contributor

@rwy7 rwy7 commented Nov 15, 2024

Hashing modules back->front lets us be a bit leaner with the state we
have to track, and hopefully will give us a nice speed improvement.

In the old algorithm, as we hashed each operation or block in the
module, we would store its position as an index into a side-table. When
a value is used, we could record the use by hashing-in the result-no and
the defining operation's index (or argument-no and block index, resp).

This index table is a major performance bottleneck for dedup: in a large
module, this table can be massive. The observation made is that values
tend to only be used near their definition. After we hash the last use
of an operation or block, we should be safe to remove its index from the
index table, and keep the index table as small as possible.

This PR modifies the hasher to walk the module backwards. When a value
is first encountered (while hashing a use/operand), we assign an ID to
the defining operation. We use that ID to hash all uses.

When the defining op is hashed, we hash its ID once more (recording the
fact that the ID is defined by the op), and remove the ID from the
table--a value can only be used after it is defined. This ensures that
we only track the ID of an operation for its live range in the IR.

The IDs are assigned according to their "first occurrence" in the
backwards walk of the IR. Since the assignment of IDs is derived from
the structure of the IR, two equivalent modules should assign the same
IDs to the same ops.

This PR also updates the hashing of inner-symbols to be handled in a
similar way. When an inner symbol is referenced, we assign an ID and
record the reference by hashing the ID. When an inner symbol is defined,
we record the definition by, again, hashing the ID. Unlike values, a
symbol can be referenced before it is defined, so we can never free
inner symbol IDs. This corrects an old logical "bug" in dedup, which
never arises in practice because chisel cannot generate it.

@rwy7 rwy7 marked this pull request as ready for review November 15, 2024 13:53
@youngar youngar changed the title Dedup: hash modules back->front [FIRRTL] Dedup: hash modules back->front Nov 15, 2024
@youngar youngar added the FIRRTL Involving the `firrtl` dialect label Nov 15, 2024
@youngar
Copy link
Member

youngar commented Nov 15, 2024

This is not your problem, but I think when hashing the attr-dict for operations, we should hash in the name of the attribute for symbols and types. So move this code right below the check for non-essential attributes:

      // Hash the interned pointer.
      update(name.getAsOpaquePointer());

Hashing modules back->front lets us be a bit leaner with the state we
have to track, and hopefully will give us a nice speed improvement.

In the old algorithm, as we hashed each operation or block in the
module, we would store its position as an index into a side-table.  When
a value is used, we could record the use by hashing-in the result-no and
the defining operation's index (or argument-no and block index, resp).

This index table is a major performance bottleneck for dedup: in a large
module, this table can be massive. The observation made is that values
tend to only be used near their definition. After we hash the last use
of an operation or block, we should be safe to remove its index from the
index table, and keep the index table as small as possible.

This PR modifies the hasher to walk the module backwards. When a value
is first encountered (while hashing a use/operand), we assign an ID to
the defining operation. We use that ID to hash all future uses.

When the defining op is hashed, we hash its ID once more (recording the
fact that the ID is defined by the op), and remove the ID from the
table--a value can only be used after it is defined. This ensures that
we only track the ID of an operation for its live range in the IR.

The IDs are assigned according to their "first occurrence" in the
backwards walk of the IR. Since the assignment of IDs is derived from
the structure of the IR, two equivalent modules should assign the same
IDs to the same ops.

This PR also updates the hashing of inner-symbols to be handled in a
similar way. When an inner symbol is referenced, we assign an ID and
record the reference by hashing the ID. When an inner symbol is defined,
we record the definition by, again, hashing the ID. Unlike values, a
symbol can be referenced before it is defined, so we can never free
inner symbol IDs. This corrects an old logical "bug" in dedup, which
never arises in practice because chisel cannot generate it.
@rwy7
Copy link
Contributor Author

rwy7 commented Nov 15, 2024

This PR also adds in a "position counter" which is hashed in with each IR object. I think this is unnecessary, but it gives me a bit of extra confidence that two different IRs will push different data through the hasher. We could look at removing it in a followup PR but I think the overhead is negligible.

@rwy7 rwy7 merged commit 2d93ce7 into llvm:main Nov 15, 2024
4 checks passed
@rwy7 rwy7 deleted the fix-perf branch November 15, 2024 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FIRRTL Involving the `firrtl` dialect
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants