[RFC] New Testers Proposal

This is a proposal for a new testers API, and supersedes issues #551 and #547. Nothing is currently set in stone, and feedback from the general Chisel community is desired. So please give it a read and let use know what you think!

# Motivation
## What’s wrong with Chisel BasicTester or HWIOTesters?
The BasicTester included with Chisel is a way to define tests as a Chisel circuit. However, as testvectors often are specified linearly in time (like imperative software), this isn’t a great match.

HWIOTesters provide a peek/poke/step API, which allows tests to be written linearly in time. However, there’s no support for parallelism (like a threading model), which makes composition of concurrent actions very difficult. Additionally, as it’s not in the base Chisel3 repository, it doesn’t seem to see as much use.

HWIOTesters also provides AdvancedTester, which allows limited background tasks to run on each cycle, supporting certain kinds of parallelism (for example, every cycle, a Decoupled driver could check if the queue is ready, and if so, enqueue a new element from a given sequence). However, the concurrent programming model is radically different from the peek-poke model, and requires the programmer to manage time as driver state.

And finally, having 3 different test frameworks really kind of sucks and limits interoperability and reuse of testing libraries.

## Goal: Unified testing
The goal here is to have one standardized way to test in Chisel3. Ideally, this would be:
- suitable for both unit tests and system integration tests
- designed for composable abstractions and layering
- able to target multiple backends and simulators (possibly requiring a link to Scala, if the testvector is not static, or using a limited test constructing API subset, when synthesizing to FPGA)
- included in base chisel3, to avoid packaging and dependency nightmares
- highly usable, encouraging unit tests by making it as easy, painless (avoiding boilerplate and other nonsense), and useful as possible to write them

# Proposal
## Testdriver Construction API
This will define an API for constructing testdriver modules.
## Basic API
These are the basic conceptual operations:
- Peek: returns the value of a circuit node
- Check: asserts that a circuit node has some value, Similar semantics to peek (details below)
- Poke: pokes a value into a circuit node
- Step: blocks until the next rising edge of the specified clock (for single-clock designs, equivalent to stepping the clock)
  Note: A better name is desired for this...

A subset of this API (poke, check, step) that is synthesizable to allow the generation of testbenches that don't require Scala to run with the simulator.

**Values are specified and returned as Chisel literals**, which is expected to interoperate with the future bundle literal constructors feature. In the future, this may be relaxed to be any Chisel expression.

**Peek, check, and poke will be defined as extensions of their relevant Chisel types** using the PML (implicit extension) pattern. For example, users would specify `io.myUInt.poke(4.U)`, or `io.myUInt.peek()` would return a Chisel literal containing the current simulation value.

This is to combine driver code with their respective Bundles, allowing these to be shared and re-used without being tied to some TestDriver subclass. For example, Decoupled might define a pokeEnqueue function which sequences the ready, valid, and bits wires and can be invoked with `io.myQueue.pokeEnqueue(4.U)`. These can then be composed, for example, a GCD IO with Decoupled input and output might have `gcd.io.checkRun(4, 2, 2)` which will enqueue (4, 2) on the inputs and expect 2 on the output when it finishes.

Pokes retain their values until updated by another poke.

## Concurrency Model
Concurrency is provided by fork-join parallelism, to be implemented using threading. Note: Scala’s coroutines are too limited to be of practical use here.

**Fork**: spawns a thread that operates in parallel, returning that thread.
**Join**: blocks until all the argument threads are completed.

## Combinational Peeks and Pokes
There are two proposals for combinational behavior of pokes, debate is ongoing about which model to adopt, or if both can coexist.

### Proposal 1: No combinational peeks and pokes
**Peeks always return the value at the beginning of the cycle**. Alternatively phrased, pokes don’t take effect until just before the step. This provides both high performance (no need to update the circuit between clock cycles) and safety against race conditions with threaded concurrency (because poke effects can’t be seen until the next cycle, and all testers are synchronized to the clock cycle, but not synchronized inbetween).

One issue would be that peeks can be written after pokes, but they will still return the pre-poke value, but this can be handled with documentation and possibly optional runtime checks against “stale” peeks. Additionally, this makes it impossible to test combinational logic, but this can be worked around with register insertion.

Note that it isn’t feasible to ensure all peeks are written before pokes for composition purposes. For example, `Decoupled.pokeEnqueue` may peek to check that the queue is ready before poking the data and valid, and calling pokeEnqueue twice on two different queues in the same cycle would result in a sequence of peek, poke, peek, poke.

Another previous proposal was to allow pokes to affect peeks, but to check that the result of peeks are still valid at the end of the cycle. While powerful, this potentially leads to brittle and nondeterministic testing libraries and is not desirable.

### Proposal 2: Combinational peeks and pokes that do not cross threads
**Peeks and pokes are resolved in the order written (combinational peeks and pokes are allowed and straightforward). Pokes may not affect peeks from other threads**, and this is checked at runtime using reachability analysis.

This provides easy testing of combinational circuits while still allowing deterministic execution in the presence of threading. Since pokes affecting peeks is done by combinational reachability analysis (which is circuit-static, instead of ad-hoc value change detection), thread execution order cannot affect the outcome of a test. Note that clocks act as a global synchronization boundary on all threads.

One possible issue is whether such reachability analysis will have a high false-positive rate. We don’t know right now, and this is something we basically have to implement and see.

Efficient simulation performance is possible by using reachability analysis to determine if the circuit needs to be updated between a poke and peek. Furthermore, it may be possible to determine if only a subset of the circuit needs to be updated.

## Multiclock Support
_This section is preliminary._

As testers only synchronize to an external clock, a separate thread can drive clocks in any arbitrary relationship.

This is the part which has seen the least attention and development (so far), but robust multiclock support is desired.

## Backends
First backend will be FIRRTerpreter, because Verilator compilation is slow (probably accounts for a significant fraction of time in running chisel3 regressions) and doesn’t support all platforms well (namely, Windows).

High performance interfaces to Verilog simulators may be possible using Java JNI to VPI instead of sockets.

## Conflicting Drivers
_This section is preliminary._

Conflicting drivers (multiple pokes to the same wire from different threads on the same cycle, even if they have the same value) are prohibited and will error out.

There will probably be some kind of priority system to allow overriding defaults, for example, pulling a Decoupled’s valid low when not in use.

Some test systems have a notion of wire ownership, specifying who can drive a wire to prevent conflicts. However, as this proposal doesn’t use an explicit driver model (theoretically saving on boilerplate code and enabling concise tests), this may not be feasible.

## Misc
**No backwards compatibility**. As all of the current Chisel testers are extremely limited in capability, many projects have opted to use other testing infrastructure. Migrating existing test code to this new infrastructure will require rewriting. Existing test systems will be deprecated but may continue to be maintained in parallel.

It may be possible to create a compatibility layer that exposes the old API.

**Mock construction and blackbox testing**. This API may be sufficient to act as a mock construction API, and may enable testing of black boxes (in conjunction with a Verilog simulator).

# Examples
## Decoupled, linear style
```scala
implicit class DecoupledTester[T](in: Decoupled[T]) {
  // Alternatively, this could directly be in Decoupled
  def enqueue(data: T) {
    require(in.ready, true.B)
    in.valid.poke(true.B)
    in.bits.poke(data)
    step(1)
    in.valid.poke(false.B, priority=low)
  }
}

// Testdriver is a subclass of Module, which must be called from a Tester environment, 
// Example DUT-as-child structure
class MyTester extends Testdriver {
  val myDut = Module(new MyModule())
  // myModule with IO(new Bundle {
  //  val in = Flipped(Decoupled(UInt(8.W)))
  //  val out = Decoupled(UInt(8.W))  // transaction of in + crtl
  //  val in2 = Flipped(Decoupled(UInt(8.W)))
  //  val out2 = Decoupled(UInt(8.W))  // transaction of in + ctrl
  //  val ctrl = UInt(8.W)
  //} )

  myDut.io.in.enqueue(42.U)  // steps a cycle inside
  myDut.io.out.dequeueExpect(43.U)  // waits for output valid, checks bits, sets ready, step
  myDut.io.ctrl.poke(2.U)  // .poke added by PML to UInt
  myDut.io.in.enqueue(45.U)
  myDut.io.out.dequeueExpect(47.U)

  // or with parallel constructs
  myDut.io.ctrl.poke(4.U)

  join(fork {
    myDut.io.in.enqueue(44.U)
    myDut.io.out.dequeueExpect(48.U)
    myDut.io.in.enqueue(46.U)
    myDut.io.out.dequeueExpect(50.U)
  } .fork {  // can be called on a thread-list, spawns a new thread that runs in parallel with the threads on the list - lightweight syntax for spawning many parallel threads
    myDut.io.in2.enqueue(1.U)
    myDut.io.out2.dequeueExpect(5.U)
    myDut.io.in2.enqueue(7.U)
    myDut.io.out2.dequeueExpect(11.U)
  })
  // tester ends at end of TestDriver and when all spawned threads completed
}
```
# External Extensions
These items are related to testing, but are most orthogonal and can be developed separately. However, they will be expected to interoperate well with testers:
- SystemVerilog Assertions (basically LTL on circuits)
- Constrained random generation
- Memory initialization


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] New Testers Proposal #725

Motivation

What’s wrong with Chisel BasicTester or HWIOTesters?

Goal: Unified testing

Proposal

Testdriver Construction API

Basic API

Concurrency Model

Combinational Peeks and Pokes

Proposal 1: No combinational peeks and pokes

Proposal 2: Combinational peeks and pokes that do not cross threads

Multiclock Support

Backends

Conflicting Drivers

Misc

Examples

Decoupled, linear style

External Extensions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] New Testers Proposal #725

Description

Motivation

What’s wrong with Chisel BasicTester or HWIOTesters?

Goal: Unified testing

Proposal

Testdriver Construction API

Basic API

Concurrency Model

Combinational Peeks and Pokes

Proposal 1: No combinational peeks and pokes

Proposal 2: Combinational peeks and pokes that do not cross threads

Multiclock Support

Backends

Conflicting Drivers

Misc

Examples

Decoupled, linear style

External Extensions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions