-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] New Testers Proposal #725
Comments
I think the proposal should say something about X propagation. @jackkoenig talked about poison in the firrtl interpreter as a closely related idea to X. The idea could be formalized more. Verilog blackboxes that interface with the firrtl interpreter could interpret X's as poison and do its own randomization. @albert-magyar had an interesting idea about being able to annotate individual registers as having different X behavior (i.e. pessimistic, optimistic or random, perhaps with random as the default). Firrtl could define semantics for how wires with different X behavior could be connected (i.e. random Xs can be assigned to any kind of X, optimistic+pessimistic should be mutually exclusive without some explicit cast). |
Discussion on combinational vs. stale (beginning-of-cycle) peeks: expose both APIs, with combinational peek being the default (since it does what the programmer expects, and will fail noisily). Users can fall back to stale peeks if combinational peeks, and we may consider changing stale peeks to the default if the false positive rate from reachability analysis is too high. Both APIs are expected to coexist, with stale peeks not running reachability analysis, and returning the value the circuit had right after the rising edge (and before any pokes would have executed). Resolution: implement stalePeek first, then peek. We don't think it's possible to implement (combinational) peek using stalePeek. |
Multiclock semantics proposal: |
I think we should lay out what our primary concerns are:
*By cycle of the tester I mean the runtime of tester logic that is required in between steps of the DUT **Obviously in arbitrary Scala code people can do whatever thread unsafe stuff they want, but when it comes to the Tester APIs, there should be a requirement of determinism (eg. if thread A pokes an input that combinationally affects an output peeked by thread B, thread ordering cannot affect the outcome). |
You want to use multithreading for testing, but need to synchronize at each clock tick? I think this will introduce a large performance overhead. Or is it just for a nice concurrent programing model for the testing code? Having a nice programming model for testing concurrent clocked systems is in my opinion an intersting and challenging question. I worked a little bit on this related to test a multicore arbitration circuit, but I am far away from a decent elegant solution. I am still at the level of writing concurrent FSMs in software to simulate the clients :-( |
Threading is mainly intended as the concurrency programming model. However, because Scala coroutines are kind of a mess and appears insufficient, threading will probably also be the implementation strategy. The main reason for this programming model is to eliminate the need to write a custom FSM as a stand-in for a program counter when multi-cycle concurrent actions are needed. Instead, actions that span multiple cycles but are otherwise logically related can be written directly (imperative style, actions directly following the previous). One example would be testing a shift register, the action for each element can be specified directly as 'poke this value, step some cycles, expect that value out', with pipelining of elements achieved by forking a thread for each element. True concurrency isn't needed, the tester will actually schedule one thread to be running at any time (without guarantees on ordering, though). Threads are only used as a mechanism to keep track of multiple program counters. Overall, the goal is to be suitable for both unit testing (allowing cycle-accurate tests) and integration testing (using composition of abstractions). Of course, it remains to be seen if this is a good idea - potential issues include pitfalls / complexity of a threading model and (as you've mentioned) threading performance. |
For concurrency have you considered the actor model? It could help
partitioning the simulation to smaller entities.
The dataflow-like reactive streams could also be useful.
I am experimenting with such an implementation in one of my projects.
Gabor
2017. dec. 30. de. 1:57 ezt írta ("Richard Lin" <notifications@github.com>):
… Threading is mainly intended as the concurrency programming model.
However, because Scala coroutines are kind of a mess and appears
insufficient, threading will probably also be the implementation strategy.
The main reason for this programming model is to eliminate the need to
write a custom FSM as a stand-in for a program counter when multi-cycle
concurrent actions are needed. Instead, actions that span multiple cycles
but are otherwise logically related can be written directly (imperative
style, actions directly following the previous). One example would be
testing a shift register, the action for each element can be specified
directly as 'poke this value, step some cycles, expect that value out',
with pipelining of elements achieved by forking a thread for each element.
True concurrency isn't needed, the tester will actually schedule one
thread to be running at any time (without guarantees on ordering, though).
Threads are only used as a mechanism to keep track of multiple program
counters.
Overall, the goal is to be suitable for both unit testing (allowing
cycle-accurate tests) and integration testing (using composition of
abstractions).
Of course, it remains to be seen if this is a good idea - potential issues
include pitfalls / complexity of a threading model and (as you've
mentioned) threading performance.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#725 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFj73BRCz7uQOa8-o_MC-RdJ-L5fdVKdks5tFYqMgaJpZM4Q3Kxa>
.
|
I think the actor model is quite similar to how the AdvancedTester (https://github.com/freechipsproject/chisel-testers/blob/master/src/main/scala/chisel3/iotesters/AdvTester.scala) works. The absence of threading means that the user needs to manually sequence multi-cycle actions using a FSM (or similar), which is a lot of programming overhead and may not compose well. Cycle-accurate unit tests may also be difficult to achieve, though it's more suitable for integration level system testing. Partitioning into actors is an interesting thought for improving performance, but this proposal mainly looks at the programming interface (how tests are written / specified) as long as potential optimizations aren't precluded. |
Have you checked SpinalSim API ? https://spinalhdl.github.io/SpinalDoc/spinal/sim/example/single_clock_fifo/ |
@Dolu1990 I haven't yet, thanks for bringing it up! Some interesting comments after reading through the docs:
|
Basicaly it is close from the COCOTB python API, my inspiration came in part from it.
|
Hoo another things that you can't do with scala continuation, is suspending the execution inside a scala for loop. (You can workaround it by having your own suspendable utils like Suspendable.repeat(count = 100){ ... } |
Yeah, the limitations of continuations seem significant (also, rumor is that it's no longer being actively maintained - instead work is being put into scala-async). It's currently unclear how significant the threading limitation will be (for example, firrtl-interpreter can simulate GCD at 2MHz - so a 20us context switch would be a massive performance hit, but rocket-chip is going to simulate much slower to where the threading overhead may be negligible). |
Right, Scala continuation doesn't look actively maintained. But at least it is ported to scala 2.12 and 2.13 About the overhead, 20us multiplied by the number of agent/threads that you need to wake up in the TB could be significant. |
Yeah, fair point about the 20us per thread, it might have scaling issues. |
Also, I've gotten a basic system up. Check out the test code in https://github.com/freechipsproject/chisel3/blob/testers2/src/test/scala/chisel3/tests/BasicTest.scala Interesting notes: the global context is split between the tester backend (Firrterpreter/Verilator/VCS/whatever) and test environment (like ScalaTest) to allow customizations for both. It also turns out that ScalaTest has an API for specifying user code location, so it can properly report the |
We discussed details at the meeting today, notes:
|
Merge strategy was discussed at today's meeting, this will be in a Target is for a merge in 2-4 weeks. Dependent on literal types. Also, someone please come up with a better name than testers2. |
Literal types turned out to be a bust, so we're going to go with runtime checks. Anyways, the discussion has now turned towards timing semantics, or attaching durations to tester actions like pokes. Current latching semanticsCurrently, uninitialized inputs are randomized, and However, it seems more natural to instead specify a default value, then let a poke override that for some duration, like a clock cycle, and reverting automatically when the duration is over. Additionally, having an explicit duration can have signals revert to invalid (X-ish) and prevent certain bugs caused by values latching for longer than they were expected to. Several poke duration proposalsin order with most promising (my opinion) first Duration scopesIdea: pokes last until the end of their duration, delineated by some kind of scope. For example, a Decoupled transaction might look like: io.valid.weakPoke(false.B)
timeScope {
io.valid.poke(true.B)
io.bits.poke(myBits)
io.clock.step(1)
} so when the Sequential pokes within a timescope would continue to have current semantics (latching until overridden), but they would all be invalidated / cleared at the end of the timescope. Pros:
Cons:
Latching / non-latching constructsIdea: Separate latching poke and nonlatching poke constructs. The nonlatching poke construct would have a time duration associated with it, and the wire value would revert to a lower priority value after its duration. The current thought is that the latching poke would have lower priority (used to set a default value). Pros:
Cons:
Unified semanticsIdea: Only have a single poke construct with a duration, which can either default to one cycle or infinite (latching). Sequential pokes from the same thread would override earlier pokes regardless of duration (or maybe only if at the default duration?). A priority system (like weakPoke) can be used for default/override. Pros:
Cons:
Thoughts? |
I'll give the rest of this a read and provide some feedback. @ducky64: This came up when going through the generator bootcamp with some questions related to multiclock testing. This was the solution that I came up with. It defines a Multiclock module: import chisel3._
import chisel3.experimental.withClock
class MultiClockModule extends Module {
val io = IO(
new Bundle {
val clockA = Input(Clock())
val clockB = Input(Clock())
val clockC = Input(Clock())
val a = Output(Bool())
val b = Output(Bool())
val c = Output(Bool())
})
/* Make each output (a, b, c) toggle using their respective clocks
* (clockA, clockB, clockC) */
Seq(io.clockA, io.clockB, io.clockC)
.zip(Seq(io.a, io.b, io.c))
.foreach{ case (clk, out) => { withClock(clk) { out := RegNext(~out) } }}
} Multiclock test: import chisel3._
import chisel3.experimental.RawModule
import chisel3.util.Counter
import chisel3.testers.{BasicTester, TesterDriver}
import chisel3.iotesters.{PeekPokeTester, ChiselFlatSpec}
/** A description of the period and phase associated with a specific
* clock */
case class ClockInfo(signal: Clock, period: Int, phase: Int = 0)
/** A clock generator of a specific period and phase */
class ClockGen(period: Int, phase: Int = 0) extends Module {
require(period > 0)
require(phase >= 0)
val io = IO(
new Bundle {
val clockOut = Output(Bool())
})
println(s"Creating clock generation with period $period, phase $phase")
val (_, start) = Counter(true.B, phase)
val started = RegInit(false.B)
started := started | start
val (count, _) = Counter(started, period)
io.clockOut := count >= (period / 2).U
}
trait MultiClockTester extends BasicTester {
self: BasicTester =>
/* Abstract method (you need to fill this in) that describes the clocks */
def clocks: Seq[ClockInfo]
/* The finish method is called just before elaboration by TesterDriver.
* This is used to generate and connect the clocks defined by the
* ClockInfo of this module. */
override def finish(): Unit = {
val scale = clocks
.map{ case ClockInfo(_, p, _) => p / 2 == p * 2 }
.reduce( _ && _ ) match {
case true => 1
case false => 2 }
clocks.foreach{ case ClockInfo(c, p, ph) =>
c := Module(new ClockGen(p * scale, ph * scale)).io.clockOut.asClock }
}
}
class MultiClockTest(timeout: Int) extends BasicTester with MultiClockTester {
/* Instantiate the design under test */
val dut = Module(new MultiClockModule)
/* Define the clocks */
val clocks = Seq(
ClockInfo(dut.io.clockA, 3),
ClockInfo(dut.io.clockB, 7),
ClockInfo(dut.io.clockC, 7, 2))
val (countA, _) = Counter(dut.io.a =/= RegNext(dut.io.a), timeout)
val (countB, _) = Counter(dut.io.b =/= RegNext(dut.io.b), timeout)
val (countC, _) = Counter(dut.io.c =/= RegNext(dut.io.c), timeout)
val (_, timeoutOccurred) = Counter(true.B, timeout)
when (timeoutOccurred) {
printf(p"In ${timeout.U} ticks, io.a ticked $countA, io.b ticket $countB, io.c ticked $countC\n")
stop()
}
}
class MultiClockSpec extends ChiselFlatSpec {
"ClockGen" should "throw exceptions on bad inputs" in {
Seq(() => new ClockGen(0, 0),
() => new ClockGen(1, -1))
.foreach( gen =>
intercept[IllegalArgumentException] { Driver.elaborate(gen) } )
}
"MultiClockTest" should "work" in {
TesterDriver.execute(() => new MultiClockTest(128))
}
} |
@ducky64 This is a late comment but it would be nice to include into this development the ability to test a DUT against a golden model. The golden model might be an earlier version of DUT that you want to ensure that it matches behavior. The golden model should also be implemented in Scala, perhaps as some sort of mock. |
More discussion on testers happened at today's meeting, mostly focusing on allowing combinational peek-after-poke behavior across threads. The driving use case is various interlocked Decoupled-to-Decoupled topologies: one-to-one (straightfoward), many-to-one (output fires when all inputs are valid, a reduction operation is applied to inputs), and one-to-many (replication across many inputs), and many-to-many. In all cases, a transaction happens only when all outputs are ready and all inputs are valid, and testdrivers are organized as two phases, poking inputs lines on the first phase, and peeking output lines on the second phase for scoreboarding. Proposed solutions:
These other solutions were also discussed but did not gain significant traction:
|
I would like to have a way to peek() just after step(1) that reads out the value of a combinatorial output just before the positive edge of the clock, which is what would be clocked into a register on an FPGA. I don't need combinatorial peek() and poke(), I just find them confusing(which is bad enough for me who's learning Chisel/FPGAs, but I'd say worse for complete beginners). Today I have used the following workaround to be able to read out the value just before the rising edge of the clock using peek() immediately after a step(1): https://groups.google.com/forum/#!topic/chisel-users/5qx9MQQQuRg |
Why do you think combinational peek/poke are confusing? It seems straightforward (at least given the RTL abstraction): when you poke something, the effects can be seen. I don't think it makes sense to read out the value before the rising edge right after a step: step means to fire a rising edge, so whatever happens after the step would happen after the rising edge. Wouldn't it make sense to peek out the value right before the step? For composing actions in parallel, the proposal is to split a timestep into phases, so there would be the main phase (where most testdriver actions happen) and also a monitor phase, where you could peek into the circuit after all the main phase actions happen but before the step. |
The reason why I find combinatorial peek/poke confusing, is because it's
not what I need to test.
What I need to test is is that the correct value would be clocked into a
register connected to an output that I peek on the rising clock edge of a
step(1).
combinatorial peek() and poke() are straightforward, but they can't be
used to write the tests I need to write.
If that is unclear, I guess it underscores my point: it's confusing.
I've explained in more detail on the mailing list:
https://groups.google.com/forum/#!topic/chisel-users/5qx9MQQQuRg
Regarding composing actions in parallel, that's not a big concern for me
currently. I've looked briefly at cocotb, which looks like an easy to use,
well thought out and powerful framework. It can be used together with
Chisel, because it can test the Verilog. I like Chisel iotesters for simple
tests, because they can easily run within the comfort of my IDE.
…On Fri, Oct 5, 2018 at 9:41 PM Richard Lin ***@***.***> wrote:
Why do you think combinational peek/poke are confusing? It seems
straightforward (at least given the RTL abstraction): when you poke
something, the effects can be seen.
I don't think it makes sense to read out the value before the rising edge
right after a step: step means to fire a rising edge, so whatever happens
after the step would happen after the rising edge. Wouldn't it make sense
to peek out the value right before the step?
For composing actions in parallel, the proposal is to split a timestep
into phases, so there would be the main phase (where most testdriver
actions happen) and also a monitor phase, where you could peek into the
circuit after all the main phase actions happen but before the step.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#725 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACq05qqij8272aciih9MBH4SOKcbbWX0ks5uh7XIgaJpZM4Q3Kxa>
.
--
Øyvind Harboe, General Manager, Zylin AS, +47 917 86 146
|
So you want peeks to be the output of an implicit register on the wire being peeked? That sounds like pretty nonintuitive behavior (unless this is actually industry standard practice for whatever reason - but I'd like examples and a rationale). Apparently this may have been the case in chisel2, though a lot of things weren't done in the greatest way in chisel2. I looked at your example, and in the absence of parallel actions where you need a total ordering, is there any reason you can't put the peeks and expects right before the step (it might also help to think of step as clock rising edge)? I don't see why adding an implicit register would be less confusing or more intuitive? |
I agree that step should be considered as clock rising edge. When you test
registers, you expect outputs to be available slightly *after* the rising
edge (not exactly at the rising edge). When I'm verifying that things are
*functionally correct* (ignoring any critical path timing issues, for
Chisel or Verilog or VHDL designs...), I use a TB to feed in data some time
after a rising edge (so it'll be registered on the next rising edge), and I
expect outputs to be valid some small time after a rising edge. In this
case, having the simulator peek and poke (starting at) falling edges
actually makes the most sense for functional verification--and I think if
you look at waveforms from Chisel tests, that's actually what it does?
Although I haven't stared at them in a while.
chisel2 registering and peek/poke simulation was actually fundamentally
incorrect (mostly from how things were registered IIRC). It generated bad
Verilog in some cases that didn't match Chisel c++ simulations.
…On Fri, Oct 5, 2018 at 3:58 PM Richard Lin ***@***.***> wrote:
So you want peeks to be the output of an implicit register on the wire
being peeked? That sounds like pretty nonintuitive behavior (unless this is
actually industry standard practice for whatever reason - but I'd like
examples and a rationale). Apparently this may have been the case in
chisel2, though a lot of things weren't done in the greatest way in chisel2.
I looked at your example, and in the absence of parallel actions where you
need a total ordering, is there any reason you can't put the peeks and
expects right before the step (it might also help to think of step as clock
rising edge)? I don't see why adding an implicit register would be less
confusing or more intuitive?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#725 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGTTFnwFdONkKocjOWEYLl0VUgoHzXZ_ks5uh-QAgaJpZM4Q3Kxa>
.
|
I haven't been using Chisel and FPGAs for very long, I'm still learning,
but here is an explanation to the best of my abilitiets:
I think you are asking me for an example of how to write a test-bench in an
"industry standard tool" (probably something like ModelSim). I will ask my
colleague who's much more knowledgable in FPGAs than me, if we can put
together an example.
It disturbs me that the test-bench has to have intimate knowledge about
implemention details, so that I can know if I need to place the peek()
before or after the step(). That doesn't sound like a robust abstraction to
me.
My understanding is that in an FPGA it doesn't make sense to talk about how
combinatorial logic is implemented. poke() and peek() gives you the ability
to "see" what's going on when signals are being changed, which you can't
know in an FPGA. The FPGA has a programming model where it can do
combinatorial logic however it wants. All we can know in an FPGA is what
would be clocked into a register on the rising edge.
…On Sat, Oct 6, 2018 at 12:58 AM Richard Lin ***@***.***> wrote:
So you want peeks to be the output of an implicit register on the wire
being peeked? That sounds like pretty nonintuitive behavior (unless this is
actually industry standard practice for whatever reason - but I'd like
examples and a rationale). Apparently this may have been the case in
chisel2, though a lot of things weren't done in the greatest way in chisel2.
I looked at your example, and in the absence of parallel actions where you
need a total ordering, is there any reason you can't put the peeks and
expects right before the step (it might also help to think of step as clock
rising edge)? I don't see why adding an implicit register would be less
confusing or more intuitive?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#725 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACq05hH24q3Cnyx79i1WWceNeiKbvcBNks5uh-QCgaJpZM4Q3Kxa>
.
--
Øyvind Harboe, General Manager, Zylin AS, +47 917 86 146
|
@shunshou I think you are saying that if I create a testbench that causes the input
to the device under test to be the output of a register, then my problem
with not having a single unambigious location to put the expect/peek() goes
away.
I gave it a try and it seems to work!
As a bonus the wavetraces become much easier to read as the signals only
change on the rising edge, which matches what my FPGA colleague uses in his
testbenches and when he explains thigns and also what I find in e.g. the
Altera manuals.
Everything then acts as I expect and there's a single unambigous location
to put the expect peek/expect() statements that does not rely on knowing
implementation details.
Thanks!
Now... for Chisel Testers2, my vote would be on a model where this is how
things work out of the box as my best understanding is that it matches the
industry standard expeceted behavior of a test-bench.
![screenshot from 2018-10-06 11-50-00](https://user-images.githubusercontent.com/2798822/46570585-9943ba80-c966-11e8-90e7-2cd6a8483af9.png)
I bet any Chisel/Scala expert would be able to make a generic testbench
wrapper utility function that would do this automatically, removing the
need to write specific test-bench code to achieve this.
[FiddlyBobTests.zip](https://github.com/freechipsproject/chisel3/files/2452976/FiddlyBobTests.zip)
…On Sat, Oct 6, 2018 at 1:27 AM Angie Wang ***@***.***> wrote:
I agree that step should be considered as clock rising edge. When you test
registers, you expect outputs to be available slightly *after* the rising
edge (not exactly at the rising edge). When I'm verifying that things are
*functionally correct* (ignoring any critical path timing issues, for
Chisel or Verilog or VHDL designs...), I use a TB to feed in data some time
after a rising edge (so it'll be registered on the next rising edge), and I
expect outputs to be valid some small time after a rising edge. In this
case, having the simulator peek and poke (starting at) falling edges
actually makes the most sense for functional verification--and I think if
you look at waveforms from Chisel tests, that's actually what it does?
Although I haven't stared at them in a while.
chisel2 registering and peek/poke simulation was actually fundamentally
incorrect (mostly from how things were registered IIRC). It generated bad
Verilog in some cases that didn't match Chisel c++ simulations.
On Fri, Oct 5, 2018 at 3:58 PM Richard Lin ***@***.***>
wrote:
> So you want peeks to be the output of an implicit register on the wire
> being peeked? That sounds like pretty nonintuitive behavior (unless this
is
> actually industry standard practice for whatever reason - but I'd like
> examples and a rationale). Apparently this may have been the case in
> chisel2, though a lot of things weren't done in the greatest way in
chisel2.
>
> I looked at your example, and in the absence of parallel actions where
you
> need a total ordering, is there any reason you can't put the peeks and
> expects right before the step (it might also help to think of step as
clock
> rising edge)? I don't see why adding an implicit register would be less
> confusing or more intuitive?
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <
#725 (comment)
>,
> or mute the thread
> <
https://github.com/notifications/unsubscribe-auth/AGTTFnwFdONkKocjOWEYLl0VUgoHzXZ_ks5uh-QAgaJpZM4Q3Kxa
>
> .
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#725 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACq05riVlmQB-gx6lHaVbuBtjTaqWxGZks5uh-r0gaJpZM4Q3Kxa>
.
--
Øyvind Harboe, General Manager, Zylin AS, +47 917 86 146
|
I think there's multiple ideas / interpretations here: is what you actually want is for testers poke to modify immediately after the rising edge, which makes the dumped waveforms more consistent with what you would see on a FPGA? This would be a separate issue from adding an implicit register stage on peeks, where it would read out the value on the previous cycle. I think the first could make sense, but the second doesn't. And for the second, you still need to know where the peek takes place (before or after the edge, implicitly registered or not), your proposal just has different semantics with more magic under the hood. If you're writing a pure Chisel design (specifically, no negedge triggered logic), then functionality wise, poking on negedge or right after posedge are equivalent, since nothing in the circuit happens on the negative edge. As for implementation details, you can't have a testbench that knows absolutely nothing about the circuit. In some cases, cycle-level timing may be important (and you may want to test that), and in others, you might be working at the transaction level. Testers2 aims to provide the former, but gives you the pieces to write abstractions that work at the latter. (alternatively phrased, you can use timing-aware semantics to build a timing-oblivious transaction library, but not the other way around) As for FPGA optimization, the synthesis tools may remap your logic to be more optimal, but testers focuses on testing the RTL as you wrote it. There are cases where you may want to test combinational circuits or subblocks, even if they're going to get completely mangled by the tools, And since you really don't necessarily know how the tools might mangle your design, you can only test the design as you wrote it. If the tools are competent, the externally visible behavior (for some definition of that) should be equivalent to your design anyways. |
“If you're writing a pure Chisel design (specifically, no negedge triggered
logic), then functionality wise, poking on negedge or right after posedge
are equivalent, since nothing in the circuit happens on the negative edge.”
I think this is a subtle point that people new to Chisel might not
understand (definitely took me a while to get a feel for testing when I
first learned Chisel...). Also, people new to RTL design and functional
verification might not understand why they’d want to “peek”/“poke” at the
negative edge (or some delta from positive edge) for positive edge
triggered designs, but once they stare at a correct waveform and remember
that registers have clock to Q and setup time requirements, things make a
lot more sense. No matter the abstraction you use, that’s something you
can’t forget as a hardware designer.
…On Saturday, October 6, 2018, Richard Lin ***@***.***> wrote:
I think there's multiple ideas / interpretations here: is what you
actually want is for testers poke to modify immediately after the rising
edge, which makes the dumped waveforms more consistent with what you would
see on a FPGA? This would be a separate issue from adding an implicit
register stage on peeks, where it would read out the value on the previous
cycle. I think the first could make sense, but the second doesn't. And for
the second, you still need to know where the peek takes place (before or
after the edge, implicitly registered or not), your proposal just has
different semantics with more magic under the hood.
If you're writing a pure Chisel design (specifically, no negedge triggered
logic), then functionality wise, poking on negedge or right after posedge
are equivalent, since nothing in the circuit happens on the negative edge.
As for implementation details, you can't have a testbench that knows
absolutely nothing about the circuit. In some cases, cycle-level timing may
be important (and you may want to test that), and in others, you might be
working at the transaction level. Testers2 aims to provide the former, but
gives you the pieces to write abstractions that work at the latter.
(alternatively phrased, you can use timing-aware semantics to build a
timing-oblivious transaction library, but not the other way around)
As for FPGA optimization, the synthesis tools may remap your logic to be
more optimal, but testers focuses on testing the RTL as you wrote it. There
are cases where you may want to test combinational circuits or subblocks,
even if they're going to get completely mangled by the tools, And since you
really don't necessarily know how the tools might mangle your design, you
can only test the design as you wrote it. If the tools are competent, the
externally visible behavior (for some definition of that) should be
equivalent to your design anyways.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#725 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGTTFrmHiUL85f0572QJ6ID1lDU5ZMaiks5uiOkGgaJpZM4Q3Kxa>
.
|
Near as I can understand, @shunshou nailed it. Her approach of creating a wafer thin wrapper that registers the inputs before they are peek'ed and poke'd() are exactly what's needed when working on an FPGA:
The only fly in the ointment, about this approach is that I have to manually create a wafer thin wrapper. Not a huge deal, but a source of error and less typing is more. I'd like to have a utility fn like "RegisterInput(Module(new Foo))" that would drill down into the io bundle and find all leave input Data objects and add registers to them. My understanding is that this utility fn would need to have access to private members of for instance the Data class. |
Hello, I was looking in to this problematic back then.
I have framework similar to chisel3 I would like to see ultimate meta-HLD language someday and I think that chisel3 has the best potential. Agents in my framework look like https://github.com/Nic30/hwt/blob/master/hwt/interfaces/agents/vldSynced.py#L33 There are 5 methods which are enought. wait(time), read(signal), write(signal, val), waitOnCombUpdate(), waitOnSeqUpdate() I can help you if you are still working on it. |
So this proposal isn't meant to be UVM in Chisel, since UVM has some drawbacks, including (from a non-user / outsiders perspective) high verbosity and excessive separation of concerns (spaghetti-with-meatballs-code). The focus here is more on lightweight unit tests, and figuring how a core set of simple abstractions might compose into something more powerful that could be used for integration testing. I think this proposal has equivalents to most of the methods you require, though with slightly different semantics. The idea here is that combinational logic operates infinitely fast, but concurrent actions can only influence others in limited ways (to avoid a huge source bugs while allowing concurrent sequences, since writing sequences is much less annoying than transforming them into FSMs). Unfortunately this proposal has also changed significantly and is in the midst of another rewrite, but hopefully examples (to come soon) will make things a bit more clear. Feedback is always welcome, though! Why do you say that system threads for each simulation thread is not an option? Is this because of potential for concurrency bugs (which we try to avoid here by detecting potential race conditions, and imposing partial thread run order)? Or is it because of performance issues from expensive OS scheduler calls? (put another way, are coroutines a good solution, and if so, what is most important over threads?) Note that Scala coroutine support is pretty bad overall, so unless that improves, we're limited in what we can do. |
OS thread are not suitable from both reasons.
|
@ducky64 And they are extremely useful because user does not have to know the protocol of the interface in order to use it. (F.e. you can just use push()/pop() method on interface agent instead of setting signals manually on fifo interface) |
That's actually a really good point: what are the features of UVM you like and dislike the most, so that we can take the best of it without being tied down to the worst of it? We're already planning to support use-defined higher levels of abstraction (eg |
Hey Richard, I can speak at length on the topic. Drop me a direct email if
you'd like to discuss further.
- UVM simulation phasing
<http://www.learnuvmverification.com/index.php/2016/04/29/uvm-phasing/>
- A uniform end of test
<http://blog.verificationgentleman.com/2016/03/an-overview-of-uvm-end-of-test-mechanisms.html>
mechanism (objections)
- Method for specifying tests via base test classes (uvm_test)
This would be over an above SystemVerilog features such as
fork/join/join_none/any and some "thread" synchronization primitives
(mailbox, semaphore)
…On Sat, Oct 13, 2018 at 11:12 AM Richard Lin ***@***.***> wrote:
That's actually a really good point: what are the features of UVM you like
and dislike the most, so that we can take the best of it without being tied
down to the worst of it?
We're already planning to support use-defined higher levels of abstraction
(eg enqueue / dequeue functions onto a Decoupled IO), so there is going
to be a bit of that separation of interface and implementation /
encapsulation of test details. Anything else you particularly want to see,
or don't want to see?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#725 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AZIj632QhU_gDD_gobITuOtt54eqKS2hks5uki0ogaJpZM4Q3Kxa>
.
|
Now it is important to have simulator core with high enough abstraction level, which will not restrict us in future.
|
Just some usefull data, Using JVM thread to emulate coroutine will use about 3 us to do the whole following : This 3 us were optained on my laptop (3.3 Ghz i7) on both native windows and linux VM, using java-thread affinity to lock the main thread and the sim thread on the same logical cpu core. Without locking the thread affinity, it is 6 us on host windows, and about 30 us on guest linux VM. So using regular JVM thread is a viable way to provide coroutine in a simulation context. Threadless example (run at 1000 Khz on my laptop) : Threadfull example (run at 200 Khz on my laptop) : |
Hoo and about JVM threads in the SpinalSim, only one is running at the time, and they always do handshakes while switching from each others, so, there is no concurency issues. |
Maybe it is possible to perform "process switching" on C++ level, it is much faster. Currently I am working on simulator which uses Verilator and Boost.Coroutine. |
It is my hope, i made some tries, but the JVM wasn't realy happy of that kind of context manipulation when jumping from the C context to the java context via JNI. Maybe/probably I made something wrong.
It isn't realy an issue, the behaviour of a real simulator can be emulated. I'm currently documenting it, There is the main simulation loop which emulate a event driven simulator by using cocotb + some tricks : Basicaly, the simulation loop provide 2 primitives, sensitive callbacks (function call on each emulated delta cycle), and delayed callbacks (call a function when the simulation reach a given time) Then the threading model is another layer on the top. |
@Dolu1990 I mean how how this simple example would work?
and register = 0, clkIn = 0 Lets have Agent which reads value from output signal on rising edge of clkOut
But where is "dut signals write generated from the callbacks logic" in your code? |
@Nic30 Hoo that stuff from VexRiscvSoftcoreContest isn't using the SpinalSim stuff, it was raw Verilator + C++ without scala invoved into the testbench. I impemented your case above : object SimPlayDeltaCycle2{
import spinal.core.sim._
class TopLevel extends Component {
val clkIn = in Bool()
val clkOut = out Bool()
val input = in(UInt(8 bits))
val output = out(UInt(8 bits))
val register = ClockDomain(clock = clkIn, config = ClockDomainConfig(resetKind = BOOT)) (Reg(UInt(8 bits)) init(0))
register := input
val registerPlusOne = register + 1
output := registerPlusOne
clkOut := clkIn
}
def main(args: Array[String]) {
SimConfig.withWave.compile(new TopLevel).doSim{dut =>
def printState(header : String) = println(s"$header dut.clkIn=${dut.clkIn.toBoolean} dut.input=${dut.input.toInt} dut.output=${dut.output.toInt} dut.clkOut=${dut.clkOut.toBoolean} time=${simTime()} deltaCycle=${simDeltaCycle()}")
dut.clkIn #= false
dut.input #= 42
printState("A")
sleep(10)
printState("B")
dut.clkIn #= true
dut.input #= 1
printState("C")
sleep(0) //A delta cycle is anways forced, but the sleep 0 allow the thread to sneak in that forced delta cycle
printState("D")
sleep(0) //Let's go for another delta cycle
printState("E")
sleep(10)
printState("F")
}
}
} Note, we can see input and clkIn going to one at the same time, because the stimulus did it, but that's probably not a good way of giving readable stimulus. Its output is : So to be sure to understand each other, there is another sample written with the dev branch of spinalsim : And there is the produced output :
For me, all look fine, but it isn't easy reading things, so i'm not saying i'm right, let's me know if something isn't looking correct.
It is https://github.com/SpinalHDL/SpinalHDL/blob/dev/sim/src/main/scala/spinal/sim/SimManager.scala#L266 |
Just spotted two issues (fixed now) : Also the delta cycle calculation is now correct, (When you fork a thread, its execution start on the next delta cycle) |
@Dolu1990 I do not see any problem in your implementation. |
This is a proposal for a new testers API, and supersedes issues #551 and #547. Nothing is currently set in stone, and feedback from the general Chisel community is desired. So please give it a read and let use know what you think!
Motivation
What’s wrong with Chisel BasicTester or HWIOTesters?
The BasicTester included with Chisel is a way to define tests as a Chisel circuit. However, as testvectors often are specified linearly in time (like imperative software), this isn’t a great match.
HWIOTesters provide a peek/poke/step API, which allows tests to be written linearly in time. However, there’s no support for parallelism (like a threading model), which makes composition of concurrent actions very difficult. Additionally, as it’s not in the base Chisel3 repository, it doesn’t seem to see as much use.
HWIOTesters also provides AdvancedTester, which allows limited background tasks to run on each cycle, supporting certain kinds of parallelism (for example, every cycle, a Decoupled driver could check if the queue is ready, and if so, enqueue a new element from a given sequence). However, the concurrent programming model is radically different from the peek-poke model, and requires the programmer to manage time as driver state.
And finally, having 3 different test frameworks really kind of sucks and limits interoperability and reuse of testing libraries.
Goal: Unified testing
The goal here is to have one standardized way to test in Chisel3. Ideally, this would be:
Proposal
Testdriver Construction API
This will define an API for constructing testdriver modules.
Basic API
These are the basic conceptual operations:
Note: A better name is desired for this...
A subset of this API (poke, check, step) that is synthesizable to allow the generation of testbenches that don't require Scala to run with the simulator.
Values are specified and returned as Chisel literals, which is expected to interoperate with the future bundle literal constructors feature. In the future, this may be relaxed to be any Chisel expression.
Peek, check, and poke will be defined as extensions of their relevant Chisel types using the PML (implicit extension) pattern. For example, users would specify
io.myUInt.poke(4.U)
, orio.myUInt.peek()
would return a Chisel literal containing the current simulation value.This is to combine driver code with their respective Bundles, allowing these to be shared and re-used without being tied to some TestDriver subclass. For example, Decoupled might define a pokeEnqueue function which sequences the ready, valid, and bits wires and can be invoked with
io.myQueue.pokeEnqueue(4.U)
. These can then be composed, for example, a GCD IO with Decoupled input and output might havegcd.io.checkRun(4, 2, 2)
which will enqueue (4, 2) on the inputs and expect 2 on the output when it finishes.Pokes retain their values until updated by another poke.
Concurrency Model
Concurrency is provided by fork-join parallelism, to be implemented using threading. Note: Scala’s coroutines are too limited to be of practical use here.
Fork: spawns a thread that operates in parallel, returning that thread.
Join: blocks until all the argument threads are completed.
Combinational Peeks and Pokes
There are two proposals for combinational behavior of pokes, debate is ongoing about which model to adopt, or if both can coexist.
Proposal 1: No combinational peeks and pokes
Peeks always return the value at the beginning of the cycle. Alternatively phrased, pokes don’t take effect until just before the step. This provides both high performance (no need to update the circuit between clock cycles) and safety against race conditions with threaded concurrency (because poke effects can’t be seen until the next cycle, and all testers are synchronized to the clock cycle, but not synchronized inbetween).
One issue would be that peeks can be written after pokes, but they will still return the pre-poke value, but this can be handled with documentation and possibly optional runtime checks against “stale” peeks. Additionally, this makes it impossible to test combinational logic, but this can be worked around with register insertion.
Note that it isn’t feasible to ensure all peeks are written before pokes for composition purposes. For example,
Decoupled.pokeEnqueue
may peek to check that the queue is ready before poking the data and valid, and calling pokeEnqueue twice on two different queues in the same cycle would result in a sequence of peek, poke, peek, poke.Another previous proposal was to allow pokes to affect peeks, but to check that the result of peeks are still valid at the end of the cycle. While powerful, this potentially leads to brittle and nondeterministic testing libraries and is not desirable.
Proposal 2: Combinational peeks and pokes that do not cross threads
Peeks and pokes are resolved in the order written (combinational peeks and pokes are allowed and straightforward). Pokes may not affect peeks from other threads, and this is checked at runtime using reachability analysis.
This provides easy testing of combinational circuits while still allowing deterministic execution in the presence of threading. Since pokes affecting peeks is done by combinational reachability analysis (which is circuit-static, instead of ad-hoc value change detection), thread execution order cannot affect the outcome of a test. Note that clocks act as a global synchronization boundary on all threads.
One possible issue is whether such reachability analysis will have a high false-positive rate. We don’t know right now, and this is something we basically have to implement and see.
Efficient simulation performance is possible by using reachability analysis to determine if the circuit needs to be updated between a poke and peek. Furthermore, it may be possible to determine if only a subset of the circuit needs to be updated.
Multiclock Support
This section is preliminary.
As testers only synchronize to an external clock, a separate thread can drive clocks in any arbitrary relationship.
This is the part which has seen the least attention and development (so far), but robust multiclock support is desired.
Backends
First backend will be FIRRTerpreter, because Verilator compilation is slow (probably accounts for a significant fraction of time in running chisel3 regressions) and doesn’t support all platforms well (namely, Windows).
High performance interfaces to Verilog simulators may be possible using Java JNI to VPI instead of sockets.
Conflicting Drivers
This section is preliminary.
Conflicting drivers (multiple pokes to the same wire from different threads on the same cycle, even if they have the same value) are prohibited and will error out.
There will probably be some kind of priority system to allow overriding defaults, for example, pulling a Decoupled’s valid low when not in use.
Some test systems have a notion of wire ownership, specifying who can drive a wire to prevent conflicts. However, as this proposal doesn’t use an explicit driver model (theoretically saving on boilerplate code and enabling concise tests), this may not be feasible.
Misc
No backwards compatibility. As all of the current Chisel testers are extremely limited in capability, many projects have opted to use other testing infrastructure. Migrating existing test code to this new infrastructure will require rewriting. Existing test systems will be deprecated but may continue to be maintained in parallel.
It may be possible to create a compatibility layer that exposes the old API.
Mock construction and blackbox testing. This API may be sufficient to act as a mock construction API, and may enable testing of black boxes (in conjunction with a Verilog simulator).
Examples
Decoupled, linear style
External Extensions
These items are related to testing, but are most orthogonal and can be developed separately. However, they will be expected to interoperate well with testers:
The text was updated successfully, but these errors were encountered: