Mapping logical memories to physical memories #1151

rachitnigam · 2022-08-19T04:20:27Z

With #1145, Calyx has rudimentary support for memories with sequential read and writes. While the specific implementation of the memory in that PR is a 1-cycle read, 1-cycle write memory, the interface actually admits arbitrary latencies for reads and writes. This is because reads needs to be "primed" by setting the read_en signal and waiting on the read_done signal. Similarly, the write interface needs to use write_en and write_done signals. The only way Calyx knows that reads and writes take one cycle is because of the @static annotations on the read and write paths. We can imagine exposing a weaker interface without the @static annotations which will force frontends to assume that reads and writes can take arbitrary number of cycles.

Next, we can design a pass that analyses memories and remaps them to physical memories with different latencies. For example, we can say (these numbers are completely made up):

If memory is bigger than 1Mb, map onto UltraRAM
If memory is bigger than 512Kb, map onto BRAM with 4 cycle read/write latency
... and so on

Side note: the reason to map onto BRAMs with more than 1-cycle of read/write latency is because of the way synthesis tools construct bigger memories from BRAM building blocks; if a memory is too big, it needs to be constructed out of multiple BRAM blocks each of which add wire delay. By taking more cycles for reads and writes, we can help the synthesis tool get better timing results since it doesn't have to fit all reads and writes into 1 cycle.

The pass itself will use a set of primitive/generated memories which have the required characteristics. The best part of this is that we don't have to give up on latency-sensitivity; the pass, once it figures out which kind of memory to use, can insert the right @static attributes into the groups that use the memories.

FWIW, this pass is a much simpler version of the compiler @andrew1999 is building.

The text was updated successfully, but these errors were encountered:

sampsyo · 2022-08-20T01:06:29Z

This plan sounds great. I just want to add that this seems like a pretty chunky piece of work—we'd need to design the high-level abstraction, adapt front-ends, create a library of interesting backing RAM implementations, and then implement the pass. So if anyone is ever looking for a discrete project to sink their teeth into, this could be one.

rachitnigam · 2022-08-20T02:21:25Z

I think @calebmkim might have this on his critical path towards writing the sharing paper (to some extent). We can’t really get resource numbers for bigger designs without this.

rachitnigam · 2022-08-20T02:22:05Z

Also, worth thinking about how this can enable HBM support: #1106

sampsyo · 2022-08-22T15:14:00Z

Indeed! As far as the "discussion needed" for this one, maybe what we should chat about is how to sidestep the need for the "full version" of this, or to build something minimal and easy that just enables big designs to compile in a reasonable way. (Just because I worry this could be a super interesting problem that would distract from sharing per se.)

rachitnigam · 2022-08-23T07:23:26Z

Yeah, I think the minimal thing to do to get sharing results is to port the Dahlia and TVM frontends to use the sequential read/write memories and default them to URAM for now. This is probably not the best thing to do but fine as a way to get started.

rachitnigam · 2023-03-09T08:39:29Z

I think this will be a good use case for evaluating the new Calyx static stuff

calebmkim · 2023-06-12T21:26:10Z

Just to revive this issue, @paili0628 and I talked about how we should implement this and we think the following might be a good idea: we could define components in Calyx that delay memory reads/writes by two cycles. E.g.,

component delay_2_mem(read_en, write_en, ..)(...) {
  cells {
    // instantiates two registers, and a memory with appropriate size
  }
  wires {...}
  control {
    static<2> par {
      if read_en {
        // read from memory by passing it through two registers 
      }
      if write_en {
        // writes to register by passing it through two registers
      }
    }
  } 
}

I think it would probably be best to add this component as a primitive.

The only question I have is: can we use delay_2_mem to replace seq_mem's? The trouble with this is that seq_mems have a defined latency of 1, which could mess up static latency inference. This makes me think it might be worth it to implement a "virtual" memory in Calyx that doesn't yet have a defined latency.

rachitnigam · 2023-06-13T18:14:41Z

That sounds like a great starting point! Couple of notes:

We don't want to generate just a delay 2 module. We want to be able to generate any delay_n module so that for smaller memories, we can delay them by a 1 cycle and for larger ones, we can delay by up to 4.
You're right that we cannot replace seq_mem with these because they have a precise latency. The solution would be to get rid of the latency annotation for seq_mem so that they cannot be used in a static context until they have been lowered.
One thing to consider is implementing this using ref cells so that instead of instantiating the memory, the component just takes a reference to the memory and delays it by two. One potential problem with this approach is that FPGA tools might see this pattern and fail to match it with the right kind of memory that we want, especially in the case when the same instance is used to delay multiple memories. If we do this, we'd need to ensure that all memories use a different instance to do the reads and that the delay components all get inlined.

rachitnigam added S: Discussion needed Issues blocked on discussion C: Calyx Extension or change to the Calyx IL labels Aug 19, 2022

rachitnigam mentioned this issue Aug 22, 2022

Calyx Sharing Results #1146

Closed

3 tasks

This was referenced Nov 13, 2022

Deprecate std_mem #1261

Closed

mult_pipe not inferred correctly pipelined for DSP inference #1175

Open

rachitnigam added Calyx 2.0 Things that move us towards Calyx 2.0 and removed S: Discussion needed Issues blocked on discussion labels Mar 9, 2023

rachitnigam added this to the Virtual Operators milestone Apr 21, 2023

rachitnigam mentioned this issue May 9, 2023

Enumerations and methods on components cucapra/filament#125

Open

rachitnigam mentioned this issue Jun 7, 2024

Add dyn_mems as primitives #2111

Merged

nathanielnrn mentioned this issue Jun 7, 2024

Discussion: Introduce dyn_mem_d1 primitive? #2105

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mapping logical memories to physical memories #1151

Mapping logical memories to physical memories #1151

rachitnigam commented Aug 19, 2022

sampsyo commented Aug 20, 2022

rachitnigam commented Aug 20, 2022

rachitnigam commented Aug 20, 2022

sampsyo commented Aug 22, 2022

rachitnigam commented Aug 23, 2022

rachitnigam commented Mar 9, 2023

calebmkim commented Jun 12, 2023

rachitnigam commented Jun 13, 2023

Mapping logical memories to physical memories #1151

Mapping logical memories to physical memories #1151

Comments

rachitnigam commented Aug 19, 2022

sampsyo commented Aug 20, 2022

rachitnigam commented Aug 20, 2022

rachitnigam commented Aug 20, 2022

sampsyo commented Aug 22, 2022

rachitnigam commented Aug 23, 2022

rachitnigam commented Mar 9, 2023

calebmkim commented Jun 12, 2023

rachitnigam commented Jun 13, 2023