Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIRRTL] Dedup memory wrapper modules in LowerMemory #6719

Merged
merged 1 commit into from
Mar 11, 2024

Conversation

tymcauley
Copy link
Contributor

@tymcauley tymcauley commented Feb 19, 2024

Instead of just dedup-ing the external memory module, we include the memory wrapper module in the dedup calculation.

Resolves #6445.

Diff in circt-opt -firrtl-lower-memory test/Dialect/FIRRTL/lower-memory.mlir after this change:

--- main
+++ dedup-lower-memory-modules
@@ -75,7 +75,7 @@
   firrtl.circuit "Dedup" {
     firrtl.module @Dedup() {
       %mem0_W0_addr, %mem0_W0_en, %mem0_W0_clk, %mem0_W0_data = firrtl.instance mem0 @mem0(in W0_addr: !firrtl.uint<4>, in W0_en: !firrtl.uint<1>, in W0_clk: !firrtl.clock, in W0_data: !firrtl.uint<42>)
-      %mem1_W0_addr, %mem1_W0_en, %mem1_W0_clk, %mem1_W0_data = firrtl.instance mem1 @mem1(in W0_addr: !firrtl.uint<4>, in W0_en: !firrtl.uint<1>, in W0_clk: !firrtl.clock, in W0_data: !firrtl.uint<42>)
+      %mem1_W0_addr, %mem1_W0_en, %mem1_W0_clk, %mem1_W0_data = firrtl.instance mem1 @mem0(in W0_addr: !firrtl.uint<4>, in W0_en: !firrtl.uint<1>, in W0_clk: !firrtl.clock, in W0_data: !firrtl.uint<42>)
     }
     firrtl.module private @mem0(in %W0_addr: !firrtl.uint<4>, in %W0_en: !firrtl.uint<1>, in %W0_clk: !firrtl.clock, in %W0_data: !firrtl.uint<42>) {
       %mem0_ext_W0_addr, %mem0_ext_W0_en, %mem0_ext_W0_clk, %mem0_ext_W0_data = firrtl.instance mem0_ext @mem0_ext(in W0_addr: !firrtl.uint<4>, in W0_en: !firrtl.uint<1>, in W0_clk: !firrtl.clock, in W0_data: !firrtl.uint<42>)
@@ -85,13 +85,6 @@
       firrtl.strictconnect %mem0_ext_W0_data, %W0_data : !firrtl.uint<42>
     }
     firrtl.memmodule private @mem0_ext(in W0_addr: !firrtl.uint<4>, in W0_en: !firrtl.uint<1>, in W0_clk: !firrtl.clock, in W0_data: !firrtl.uint<42>) attributes {dataWidth = 42 : ui32, depth = 12 : ui64, extraPorts = [], maskBits = 1 : ui32, numReadPorts = 0 : ui32, numReadWritePorts = 0 : ui32, numWritePorts = 1 : ui32, readLatency = 1 : ui32, writeLatency = 1 : ui32}
-    firrtl.module private @mem1(in %W0_addr: !firrtl.uint<4>, in %W0_en: !firrtl.uint<1>, in %W0_clk: !firrtl.clock, in %W0_data: !firrtl.uint<42>) {
-      %mem0_ext_W0_addr, %mem0_ext_W0_en, %mem0_ext_W0_clk, %mem0_ext_W0_data = firrtl.instance mem0_ext @mem0_ext(in W0_addr: !firrtl.uint<4>, in W0_en: !firrtl.uint<1>, in W0_clk: !firrtl.clock, in W0_data: !firrtl.uint<42>)
-      firrtl.strictconnect %mem0_ext_W0_addr, %W0_addr : !firrtl.uint<4>
-      firrtl.strictconnect %mem0_ext_W0_en, %W0_en : !firrtl.uint<1>
-      firrtl.strictconnect %mem0_ext_W0_clk, %W0_clk : !firrtl.clock
-      firrtl.strictconnect %mem0_ext_W0_data, %W0_data : !firrtl.uint<42>
-    }
   }
   firrtl.circuit "NoTestharnessDedup0" {
     firrtl.module @NoTestharnessDedup0() {
@@ -257,30 +250,23 @@
     hw.hierpath private @nla0 [@NonLocalAnnotation::@dut, @DUT::@mem0]
     hw.hierpath private @nla0_0 [@NonLocalAnnotation::@dut, @DUT::@mem0, @mem0]
     hw.hierpath private @nla1 [@NonLocalAnnotation::@dut, @DUT]
-    hw.hierpath private @nla1_0 [@NonLocalAnnotation::@dut, @DUT::@sym, @mem1]
+    hw.hierpath private @nla1_0 [@NonLocalAnnotation::@dut, @DUT::@sym, @mem0]
     firrtl.module @NonLocalAnnotation() {
       firrtl.instance dut sym @dut @DUT()
     }
     firrtl.module @DUT() {
       %mem0_W0_addr, %mem0_W0_en, %mem0_W0_clk, %mem0_W0_data = firrtl.instance mem0 sym @mem0 @mem0(in W0_addr: !firrtl.uint<4>, in W0_en: !firrtl.uint<1>, in W0_clk: !firrtl.clock, in W0_data: !firrtl.uint<42>)
-      %mem1_W0_addr, %mem1_W0_en, %mem1_W0_clk, %mem1_W0_data = firrtl.instance mem1 sym @sym @mem1(in W0_addr: !firrtl.uint<4>, in W0_en: !firrtl.uint<1>, in W0_clk: !firrtl.clock, in W0_data: !firrtl.uint<42>)
+      %mem1_W0_addr, %mem1_W0_en, %mem1_W0_clk, %mem1_W0_data = firrtl.instance mem1 sym @sym @mem0(in W0_addr: !firrtl.uint<4>, in W0_en: !firrtl.uint<1>, in W0_clk: !firrtl.clock, in W0_data: !firrtl.uint<42>)
       %MRead_read = firrtl.mem Undefined {depth = 12 : i64, name = "MRead", portNames = ["read"], readLatency = 0 : i32, writeLatency = 1 : i32} : !firrtl.bundle<addr: uint<4>, en: uint<1>, clk: clock, data flip: uint<42>>
     }
     firrtl.module private @mem0(in %W0_addr: !firrtl.uint<4>, in %W0_en: !firrtl.uint<1>, in %W0_clk: !firrtl.clock, in %W0_data: !firrtl.uint<42>) {
-      %mem0_ext_W0_addr, %mem0_ext_W0_en, %mem0_ext_W0_clk, %mem0_ext_W0_data = firrtl.instance mem0_ext sym @mem0_ext {annotations = [{circt.nonlocal = @nla0_0, class = "test0"}]} @mem0_ext(in W0_addr: !firrtl.uint<4>, in W0_en: !firrtl.uint<1>, in W0_clk: !firrtl.clock, in W0_data: !firrtl.uint<42>)
+      %mem0_ext_W0_addr, %mem0_ext_W0_en, %mem0_ext_W0_clk, %mem0_ext_W0_data = firrtl.instance mem0_ext sym @mem0_ext {annotations = [{circt.nonlocal = @nla0_0, class = "test0"}, {circt.nonlocal = @nla1_0, class = "test1"}]} @mem0_ext(in W0_addr: !firrtl.uint<4>, in W0_en: !firrtl.uint<1>, in W0_clk: !firrtl.clock, in W0_data: !firrtl.uint<42>)
       firrtl.strictconnect %mem0_ext_W0_addr, %W0_addr : !firrtl.uint<4>
       firrtl.strictconnect %mem0_ext_W0_en, %W0_en : !firrtl.uint<1>
       firrtl.strictconnect %mem0_ext_W0_clk, %W0_clk : !firrtl.clock
       firrtl.strictconnect %mem0_ext_W0_data, %W0_data : !firrtl.uint<42>
     }
     firrtl.memmodule private @mem0_ext(in W0_addr: !firrtl.uint<4>, in W0_en: !firrtl.uint<1>, in W0_clk: !firrtl.clock, in W0_data: !firrtl.uint<42>) attributes {dataWidth = 42 : ui32, depth = 12 : ui64, extraPorts = [], maskBits = 1 : ui32, numReadPorts = 0 : ui32, numReadWritePorts = 0 : ui32, numWritePorts = 1 : ui32, readLatency = 1 : ui32, writeLatency = 1 : ui32}
-    firrtl.module private @mem1(in %W0_addr: !firrtl.uint<4>, in %W0_en: !firrtl.uint<1>, in %W0_clk: !firrtl.clock, in %W0_data: !firrtl.uint<42>) {
-      %mem0_ext_W0_addr, %mem0_ext_W0_en, %mem0_ext_W0_clk, %mem0_ext_W0_data = firrtl.instance mem0_ext sym @mem0_ext {annotations = [{circt.nonlocal = @nla1_0, class = "test1"}]} @mem0_ext(in W0_addr: !firrtl.uint<4>, in W0_en: !firrtl.uint<1>, in W0_clk: !firrtl.clock, in W0_data: !firrtl.uint<42>)
-      firrtl.strictconnect %mem0_ext_W0_addr, %W0_addr : !firrtl.uint<4>
-      firrtl.strictconnect %mem0_ext_W0_en, %W0_en : !firrtl.uint<1>
-      firrtl.strictconnect %mem0_ext_W0_clk, %W0_clk : !firrtl.clock
-      firrtl.strictconnect %mem0_ext_W0_data, %W0_data : !firrtl.uint<42>
-    }
   }
 }

@tymcauley tymcauley marked this pull request as draft February 19, 2024 18:25
@tymcauley tymcauley changed the title Draft: [FIRRTL] Dedup memory wrapper modules in LowerMemory [FIRRTL] Dedup memory wrapper modules in LowerMemory Feb 20, 2024
@tymcauley tymcauley marked this pull request as ready for review February 20, 2024 19:49
Copy link
Contributor

@darthscsi darthscsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the PR comment, I don't understand @mem0 instantiating mem0_ext_0 and mem0_ext. Shouldn't @Dedup simply be instantiating mem0 twice? The change to the test in the PR looks more like what I'd expect.

lib/Dialect/FIRRTL/Transforms/LowerMemory.cpp Outdated Show resolved Hide resolved
lib/Dialect/FIRRTL/Transforms/LowerMemory.cpp Outdated Show resolved Hide resolved
@tymcauley
Copy link
Contributor Author

In the PR comment, I don't understand @mem0 instantiating mem0_ext_0 and mem0_ext. Shouldn't @Dedup simply be instantiating mem0 twice? The change to the test in the PR looks more like what I'd expect.

Yup, you're right. There's a bug in my change right now, I need to move the memInst creation to be at the point where we define the wrapper module. I think this should also explain the issue I was having with the duplicate inner symbols.

Thanks for pointing that out!

@tymcauley
Copy link
Contributor Author

tymcauley commented Feb 21, 2024

Alright, I've made a few updates:

  • MemOps is a struct instead of a pair
  • We create the external memory instance when we create the wrapper module definition, to ensure that we don't create duplicate instances inside that wrapper module.
  • We strip all NLAs from that external memory instance when we create it, and then port over the NLAs of every MemOp that now refers to that new instance.

I've updated the PR comment with the new output of LowerMemory on lower-memory.mlir.

@tymcauley
Copy link
Contributor Author

tymcauley commented Feb 25, 2024

Side question: in the NonLocalAnnotation circuit in lower-memory.mlir, should the @nla0 and @nla1 non-local annotations be deleted during this pass (since they refer to the original firrtl.mem instances), or are they cleaned up elsewhere, or should we just leave them be?

I'm referring to the fact that all four of these hw.hierpath operations are in the output from this pass:

firrtl.circuit "NonLocalAnnotation" {
  hw.hierpath private @nla0 [@NonLocalAnnotation::@dut, @DUT::@mem0]
  hw.hierpath private @nla0_0 [@NonLocalAnnotation::@dut, @DUT::@mem0, @mem0]
  hw.hierpath private @nla1 [@NonLocalAnnotation::@dut, @DUT]
  hw.hierpath private @nla1_0 [@NonLocalAnnotation::@dut, @DUT::@sym, @mem0]
  ...

@darthscsi
Copy link
Contributor

Side question: in the NonLocalAnnotation circuit in lower-memory.mlir, should the @nla0 and @nla1 non-local annotations be deleted during this pass (since they refer to the original firrtl.mem instances), or are they cleaned up elsewhere, or should we just leave them be?

If you change then hierarchy, you need to update the hierpaths. Dedup is somewhat unique in that it has to interpret the NLAs to decide if they matter or not (e.g. block dedup). (this is part of what makes dedup fragile)

@darthscsi
Copy link
Contributor

Pinging @youngar and @prithayan as they are more familiar with the memory op related constraints.

@prithayan
Copy link
Contributor

Sorry for the delay, I will take a look at this today.

Copy link
Contributor

@prithayan prithayan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes make sense. Dead HierPathOp can be left for cleanup by later passes, but if symbol names are updated the HierPathOp needs to be updated and cannot be left in an inconsistent state. The pass is already handling them correctly.

@tymcauley
Copy link
Contributor Author

The changes make sense. Dead HierPathOp can be left for cleanup by later passes, but if symbol names are updated the HierPathOp needs to be updated and cannot be left in an inconsistent state. The pass is already handling them correctly.

Thanks for clearing that up @prithayan! Is there anything I should take care of before this can be merged? Let me know if you'd like me to rebase on the latest main.

@prithayan
Copy link
Contributor

The changes make sense. Dead HierPathOp can be left for cleanup by later passes, but if symbol names are updated the HierPathOp needs to be updated and cannot be left in an inconsistent state. The pass is already handling them correctly.

Thanks for clearing that up @prithayan! Is there anything I should take care of before this can be merged? Let me know if you'd like me to rebase on the latest main.

Please rebase and merge it.

@tymcauley
Copy link
Contributor Author

The changes make sense. Dead HierPathOp can be left for cleanup by later passes, but if symbol names are updated the HierPathOp needs to be updated and cannot be left in an inconsistent state. The pass is already handling them correctly.

Thanks for clearing that up @prithayan! Is there anything I should take care of before this can be merged? Let me know if you'd like me to rebase on the latest main.

Please rebase and merge it.

I've rebased, but I don't have write permissions, so someone else will have to merge it.

Instead of just dedup-ing the external memory module, we include the
memory wrapper module in the dedup calculation.

Resolves llvm#6445.
@tymcauley
Copy link
Contributor Author

@prithayan just rebased again, can you merge this please? Thanks!

@prithayan prithayan merged commit 2e23cda into llvm:main Mar 11, 2024
4 checks passed
@prithayan
Copy link
Contributor

Sorry, missed to merge it last week.
Thanks for fixing the issue :)
Looking forward to more future contribution @tymcauley .

@tymcauley tymcauley deleted the dedup-lower-memory-modules branch March 11, 2024 04:56
@tymcauley
Copy link
Contributor Author

Sorry, missed to merge it last week. Thanks for fixing the issue :) Looking forward to more future contribution @tymcauley .

No problem @prithayan, thanks so much for taking care of this!

mikeurbach added a commit that referenced this pull request Mar 14, 2024
This reverts commit 2e23cda.

This change caused memories that were previously deduping to no
longer dedupe, so we're reverting this while we investigate. An
issue will be created to include a small reproducer and track
re-landing this change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Dedup][FIRRTL] SRAM modules not deduped when using memory macro replacement
3 participants