Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About fsm function of "Interface" trait. #16

Closed
JongyCysec opened this issue Sep 24, 2024 · 9 comments
Closed

About fsm function of "Interface" trait. #16

JongyCysec opened this issue Sep 24, 2024 · 9 comments
Assignees
Labels
question Further information is requested

Comments

@JongyCysec
Copy link

I've gone through CPU-core pipelined stage implemented by HazardFlow designs.

In my understanding, each pipeline(fetch, decode, ...) and its sub-pipelines are implemented by module that is connected to aside modules via "Interface" object such as I<VrH<P, R>, D>.

And each submodules have their own specialized functionality such "source", "filter", "map" , etc.

Such specialized functionalities are based on fundamental function "fsm" of "Interface" trait that outputs egress payload, ingress resolver based on egress resolver and ingress payload. (+state).

Then, when I 've gone through specific "fsm" function body, just "panic!(compiler magic)" mentioned.

Then, I wonder what happened we build our own high level hazardflow hdl.
Although, only "panic" statements is left in "fsm" function body, is such fundamental function(fsm) compiled properly into low level hdl like verilog?

@minseongg
Copy link
Member

minseongg commented Sep 24, 2024

First of all, your understanding of the concept of the HazardFlow is correct.

Although, only "panic" statements is left in "fsm" function body, is such fundamental function(fsm) compiled properly into low level hdl like verilog?

Yes. In the HazardFlow compiler, the fsm function is not actually executed (which would lead to a panic). Instead, the HazardFlow compiler captures the High-level IR generated by the Rust compiler and extracts information about the input / output / state types (Self, E, S) and the arguments (init_state, f) of the fsm function. Using this information, the HazardFlow compiler generates the corresponding Verilog code.

Thanks for the question, we will add a brief explanation of the above process to the code and the documentation.

FYI: The HazardFlow compiler implementation can be found in the hazardflow directory, though it is okay to omit the implementation of the compiler.

@JongyCysec
Copy link
Author

JongyCysec commented Sep 25, 2024

@minseongg
If you don't mind, can I ask some questions related to hazardflow design?

With respect to "fetch" pipeline stage,
the 2nd & 3rd modules of fetch stage is filter_map and reg_fwd_with_init.

And each of them consists of two submodules again.
For example, "filter_map" is implemented by connecting map_resolver_inner module and naked_fsm_filter_map.
And "reg_fwd_with_init" is implemented by connecting map_resolver_inner and naked_reg_fwd_with_opt_init.

And the distinguishable characteristic is that the resolver type of intermediary interface of such two sub-modules is
Ready<(R,S)>.
However, f : impl Fn(P, S) -> (HOption<EP>, S) argument of map_resolver_inner is |r, _| r in filter_map and reg_fwd_with_init. So, it will not be seen by other modules outside. Furthermore, map_resolver_inner with such f argument does nothing. It just forwards ingress payload to egress payload without changes and it even does not check ready signal of egress resolver.

So I wonder what is the motivation that filter_map or reg_fwd_with_init are divided into two sub-modules where map_resolver_inner seems not necessary.

And I wonder the reason why resolver type of intermediary interface is Ready<(R,S)>.

Thank you.

@minseongg
Copy link
Member

minseongg commented Sep 26, 2024

Let me explain the structure and the role of each submodule in the fsm_filter_map:

  • map_resolver_inner: It does not modify the valid or ready signals of the interface, it just drops the unnecessary state signal (of type S) from the egress resolver (of type Ready<(R, S)>) to generate the ingress resolver (of type Ready<R>).
  • naked_fsm_filter_map: It internally behaves same as fsm_filter_map (computing the egress payload and the next state based on the ingress payload and the current state), but it additionally sends the internal state to the ingress resolver.

Here, the naked_fsm_filter_map can be seen as the more generic version of the fsm_filter_map combinator since it internally behaves the same as fsm_filter_map, but it provides more signals to the ingress resolver.

Our design motivation for the fsm_filter_map is to give some modification to the ingress interface of a more generic combinator, which is dropping some unnecessary resolver signals from the ingress interface of naked_fsm_filter_map by placing map_resolver_inner in front of them.

This design pattern is used quite commonly in our combinator implementation. For example:

  • fsm_map = map_resolver_inner + naked_fsm_map (source)
  • reg_fwd_with_init = map_resolver_inner + naked_reg_fwd_with_opt_init (source)

So, the answers to your questions are:

So I wonder why fsm_filter_map or reg_fwd_with_init are divided into two sub-modules where map_resolver_inner seems unnecessary.

fsm_filter_map and reg_fwd_with_init are designed to give some modification to the ingress interface of a more generic combinator (which is naked_fsm_filter_map and naked_reg_fwd_with_opt_init respectively), here map_resolver_inner is used to dropping some unnecessary resolver signals.

And I wonder the reason why the resolver type of intermediary interface is Ready<(R, S)>.

The reason is that more generic combinators' resolver type of ingress interface is Ready<(R, S)>.

I hope this answers your question; feel free to ask further questions.

@JongyCysec
Copy link
Author

JongyCysec commented Sep 26, 2024

@minseongg

Thanks to your elaborate explanation, I got the intention of such sub module design.

So, module prefixed with "naked" will additionally put its state into ingress resolver.

Then, I wonder in which scenario such additional state is necessary.
In other words, when do we need naked_some_module_func without map_resolver_inner eliminator?

1.

At first, I've come up with superscalar architecture or out-of-order cpu that requires internal queues in its each pipeline.
So, afterward module should notify the fullness of its internal queues to previous module in resolver signal.
But, I think just "ready" bit will work for it.

2.

Another scenario I've thought for was hazard.
Destination registers of Memory pipeline and Execution pipeline should cumulate.
For instance,

lw r1 0(r2)
add r3 r4
sub r1 r3

"Sub" instruction in decode stage(M1) should be noticed that its source(r3) & destination(r1) registers are not ready.
So, afterward modules dealing with "Load" instruction(M3) and "Add" instruction(M2) should send its destination register information to M1 by putting such information as additional state.

Then, "Sub" decode stage(M1) will receive resolver signal which contains states of afterward modules(M2, M3) and it will clean up such states by map_resolver_inner since such states of afterward modules(M2, M3) are not necessary anymore.

In other words, states of afterward modules are cumulated by naked_some_module_func and sent to previous stage modules.
When such states information are consumed at some previous stage module(M1) and become obsolete,
then such states will be cleaned up via "map_resolver_inner" function by module(M1).

In my opinion, "producing states in resolver" and "consuming states of afterward modules" is implemented by naked_some_module_func and map_resolver_map.
Furthermore, such two works can be done in separate modules in cpu in general. ( in contrast to fsm_filter_map that do "producing" and "consuming" in its own module)

Can I ask you to check whether my understanding is proper or not.

@minseongg
Copy link
Member

Yes, your understanding is correct.

In general, resolving the structural hazard (related to your first scenario) requires only a ready bit, so there is no need to use naked_* combinators. However, resolving data hazards (related to your second scenario) or control hazards requires checking the internal state of later stages, so using naked_* combinators can be helpful.

Also, it seems correct to understand as "producing states in resolver" in naked_* combinators and "consuming states of afterward modules" in map_resolver_* combinators. They can be done in separate modules in the CPU.

@minseongg minseongg self-assigned this Sep 26, 2024
@JongyCysec
Copy link
Author

JongyCysec commented Sep 27, 2024

@minseongg

I'm worried that consecutive, minor questions may bother you.

Question 1.

In my understanding, overall structure of attach_resolver(m) wrapper is like below.
image

Then, the structure above seems similar to combination of lfork and join.
So, I wonder we can reuse lfork combinator and join combinator to implement attah_resolver wrapper.

It seems lfork_uni function may work since it forks ingress interface into two egress interfaces where resolver type of them are H::R and ().

However, in actual implementation, attach_resolver is implemented by two consecutive fsm combinator.
So I suspect that the reason why lfork & join combinators are not used in attach_resolver is that we need unnecessary transfer of payload of ingress interface into individual forked egress interfaces.

Question 2.

I've sketched simple structure of fetch submodules and interfaces as below.
Specifying each types of payload and resolver between modules may be helpful to understand the structure.
So can I ask you to check whether it is right or wrong.

image

@minseongg
Copy link
Member

minseongg commented Sep 28, 2024

Answer to Question 1:

Implementing the attach_resolver using lfork and join sounds a good approach. However, there may be challenges, as they might not be able to maintain validity at the egress interface if m internally takes one or more cycles.

For example, let's assume that m is reg_fwd. In that case, the high-level structure would look like the following:

An example of the cycle-level behavior would be as follows:

Waveform

  • In the first cycle, a transfer occurs at the ingress interface of lfork (I0). Simultaneously, transfer occurs at both egress interfaces of lfork (I1 and I2), and the transfer at I1 updating the state of reg_fwd to 0x42.
  • In the second cycle, reg_fwd holds a valid payload (0x42) at the egress interface (I3), which is expected to be transferred. However, the transfer cannot occur because the valid signal at I2 has already been dropped due to the absence of logic to preserve the valid signal.

To address this problem, we could add an additional FIFO at I2 to preserve the valid signal, but this would increase area and power consumption. Instead, I think the attach_resolver can be implemented with another combination of "1-to-N" and "N-to-1" combinators (e.g., branch + mux). I'll investigate this further and then let you know.

Answer to Question 2:

Yes, it is right. Good job!

+) Could you please create a new issue for further questions unrelated to the title? Your questions will be very helpful to other students, but they might be difficult to find by looking at only the title.

@minseongg
Copy link
Member

We have updated the hazardflow repository:

@minseongg
Copy link
Member

Closing, feel free to reopen if you have further questions!

@minseongg minseongg added the question Further information is requested label Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants