Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapping logical memories to physical memories #1151

Open
rachitnigam opened this issue Aug 19, 2022 · 8 comments
Open

Mapping logical memories to physical memories #1151

rachitnigam opened this issue Aug 19, 2022 · 8 comments
Labels
C: Calyx Extension or change to the Calyx IL Calyx 2.0 Things that move us towards Calyx 2.0

Comments

@rachitnigam
Copy link
Contributor

With #1145, Calyx has rudimentary support for memories with sequential read and writes. While the specific implementation of the memory in that PR is a 1-cycle read, 1-cycle write memory, the interface actually admits arbitrary latencies for reads and writes. This is because reads needs to be "primed" by setting the read_en signal and waiting on the read_done signal. Similarly, the write interface needs to use write_en and write_done signals. The only way Calyx knows that reads and writes take one cycle is because of the @static annotations on the read and write paths. We can imagine exposing a weaker interface without the @static annotations which will force frontends to assume that reads and writes can take arbitrary number of cycles.

Next, we can design a pass that analyses memories and remaps them to physical memories with different latencies. For example, we can say (these numbers are completely made up):

  • If memory is bigger than 1Mb, map onto UltraRAM
  • If memory is bigger than 512Kb, map onto BRAM with 4 cycle read/write latency
  • ... and so on

Side note: the reason to map onto BRAMs with more than 1-cycle of read/write latency is because of the way synthesis tools construct bigger memories from BRAM building blocks; if a memory is too big, it needs to be constructed out of multiple BRAM blocks each of which add wire delay. By taking more cycles for reads and writes, we can help the synthesis tool get better timing results since it doesn't have to fit all reads and writes into 1 cycle.

The pass itself will use a set of primitive/generated memories which have the required characteristics. The best part of this is that we don't have to give up on latency-sensitivity; the pass, once it figures out which kind of memory to use, can insert the right @static attributes into the groups that use the memories.

FWIW, this pass is a much simpler version of the compiler @andrew1999 is building.

@rachitnigam rachitnigam added S: Discussion needed Issues blocked on discussion C: Calyx Extension or change to the Calyx IL labels Aug 19, 2022
@sampsyo
Copy link
Contributor

sampsyo commented Aug 20, 2022

This plan sounds great. I just want to add that this seems like a pretty chunky piece of work—we'd need to design the high-level abstraction, adapt front-ends, create a library of interesting backing RAM implementations, and then implement the pass. So if anyone is ever looking for a discrete project to sink their teeth into, this could be one.

@rachitnigam
Copy link
Contributor Author

I think @calebmkim might have this on his critical path towards writing the sharing paper (to some extent). We can’t really get resource numbers for bigger designs without this.

@rachitnigam
Copy link
Contributor Author

Also, worth thinking about how this can enable HBM support: #1106

@rachitnigam rachitnigam mentioned this issue Aug 22, 2022
3 tasks
@sampsyo
Copy link
Contributor

sampsyo commented Aug 22, 2022

Indeed! As far as the "discussion needed" for this one, maybe what we should chat about is how to sidestep the need for the "full version" of this, or to build something minimal and easy that just enables big designs to compile in a reasonable way. (Just because I worry this could be a super interesting problem that would distract from sharing per se.)

@rachitnigam
Copy link
Contributor Author

Yeah, I think the minimal thing to do to get sharing results is to port the Dahlia and TVM frontends to use the sequential read/write memories and default them to URAM for now. This is probably not the best thing to do but fine as a way to get started.

@rachitnigam
Copy link
Contributor Author

I think this will be a good use case for evaluating the new Calyx static stuff

@rachitnigam rachitnigam added Calyx 2.0 Things that move us towards Calyx 2.0 and removed S: Discussion needed Issues blocked on discussion labels Mar 9, 2023
@rachitnigam rachitnigam added this to the Virtual Operators milestone Apr 21, 2023
@calebmkim
Copy link
Contributor

Just to revive this issue, @paili0628 and I talked about how we should implement this and we think the following might be a good idea: we could define components in Calyx that delay memory reads/writes by two cycles. E.g.,

component delay_2_mem(read_en, write_en, ..)(...) {
  cells {
    // instantiates two registers, and a memory with appropriate size
  }
  wires {...}
  control {
    static<2> par {
      if read_en {
        // read from memory by passing it through two registers 
      }
      if write_en {
        // writes to register by passing it through two registers
      }
    }
  } 
} 

I think it would probably be best to add this component as a primitive.

The only question I have is: can we use delay_2_mem to replace seq_mem's? The trouble with this is that seq_mems have a defined latency of 1, which could mess up static latency inference. This makes me think it might be worth it to implement a "virtual" memory in Calyx that doesn't yet have a defined latency.

@rachitnigam
Copy link
Contributor Author

That sounds like a great starting point! Couple of notes:

  1. We don't want to generate just a delay 2 module. We want to be able to generate any delay_n module so that for smaller memories, we can delay them by a 1 cycle and for larger ones, we can delay by up to 4.
  2. You're right that we cannot replace seq_mem with these because they have a precise latency. The solution would be to get rid of the latency annotation for seq_mem so that they cannot be used in a static context until they have been lowered.
  3. One thing to consider is implementing this using ref cells so that instead of instantiating the memory, the component just takes a reference to the memory and delays it by two. One potential problem with this approach is that FPGA tools might see this pattern and fail to match it with the right kind of memory that we want, especially in the case when the same instance is used to delay multiple memories. If we do this, we'd need to ensure that all memories use a different instance to do the reads and that the delay components all get inlined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: Calyx Extension or change to the Calyx IL Calyx 2.0 Things that move us towards Calyx 2.0
Projects
None yet
Development

No branches or pull requests

3 participants