-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIRRTL] Dedup: hash modules back->front #7820
Conversation
This is not your problem, but I think when hashing the attr-dict for operations, we should hash in the name of the attribute for symbols and types. So move this code right below the check for non-essential attributes:
|
Hashing modules back->front lets us be a bit leaner with the state we have to track, and hopefully will give us a nice speed improvement. In the old algorithm, as we hashed each operation or block in the module, we would store its position as an index into a side-table. When a value is used, we could record the use by hashing-in the result-no and the defining operation's index (or argument-no and block index, resp). This index table is a major performance bottleneck for dedup: in a large module, this table can be massive. The observation made is that values tend to only be used near their definition. After we hash the last use of an operation or block, we should be safe to remove its index from the index table, and keep the index table as small as possible. This PR modifies the hasher to walk the module backwards. When a value is first encountered (while hashing a use/operand), we assign an ID to the defining operation. We use that ID to hash all future uses. When the defining op is hashed, we hash its ID once more (recording the fact that the ID is defined by the op), and remove the ID from the table--a value can only be used after it is defined. This ensures that we only track the ID of an operation for its live range in the IR. The IDs are assigned according to their "first occurrence" in the backwards walk of the IR. Since the assignment of IDs is derived from the structure of the IR, two equivalent modules should assign the same IDs to the same ops. This PR also updates the hashing of inner-symbols to be handled in a similar way. When an inner symbol is referenced, we assign an ID and record the reference by hashing the ID. When an inner symbol is defined, we record the definition by, again, hashing the ID. Unlike values, a symbol can be referenced before it is defined, so we can never free inner symbol IDs. This corrects an old logical "bug" in dedup, which never arises in practice because chisel cannot generate it.
This PR also adds in a "position counter" which is hashed in with each IR object. I think this is unnecessary, but it gives me a bit of extra confidence that two different IRs will push different data through the hasher. We could look at removing it in a followup PR but I think the overhead is negligible. |
Hashing modules back->front lets us be a bit leaner with the state we
have to track, and hopefully will give us a nice speed improvement.
In the old algorithm, as we hashed each operation or block in the
module, we would store its position as an index into a side-table. When
a value is used, we could record the use by hashing-in the result-no and
the defining operation's index (or argument-no and block index, resp).
This index table is a major performance bottleneck for dedup: in a large
module, this table can be massive. The observation made is that values
tend to only be used near their definition. After we hash the last use
of an operation or block, we should be safe to remove its index from the
index table, and keep the index table as small as possible.
This PR modifies the hasher to walk the module backwards. When a value
is first encountered (while hashing a use/operand), we assign an ID to
the defining operation. We use that ID to hash all uses.
When the defining op is hashed, we hash its ID once more (recording the
fact that the ID is defined by the op), and remove the ID from the
table--a value can only be used after it is defined. This ensures that
we only track the ID of an operation for its live range in the IR.
The IDs are assigned according to their "first occurrence" in the
backwards walk of the IR. Since the assignment of IDs is derived from
the structure of the IR, two equivalent modules should assign the same
IDs to the same ops.
This PR also updates the hashing of inner-symbols to be handled in a
similar way. When an inner symbol is referenced, we assign an ID and
record the reference by hashing the ID. When an inner symbol is defined,
we record the definition by, again, hashing the ID. Unlike values, a
symbol can be referenced before it is defined, so we can never free
inner symbol IDs. This corrects an old logical "bug" in dedup, which
never arises in practice because chisel cannot generate it.