Skip to content

Latest commit

 

History

History
482 lines (426 loc) · 34.4 KB

TODO.org

File metadata and controls

482 lines (426 loc) · 34.4 KB

TODO

Lessons from Nyxt

  • Nyxt continuous testing/packaging
  • Good Common Lisp Documentation Patterns
    • Dynamic/special/global variables with *earmuffs*.
    • Constants with +plus-signs+.
    • If internal functions and variables must be exported, wrap them with %percent-signs%.
    • Predicates with -p suffix.
    • README layout:
      1. Title and description - General information about the project. Goal and features.
      2. Getting started - How to install and load.
      3. Examples - Short code snippets for the most common use-cases.
      4. How it works - Explanations of the underlying model and implementation details.
    • Function/method doc-strings:
      "First line as a single sentence less than 80 chars long—for readability.
      Return VALUE or
      - VALUE-1
      - VALUE-2
      - etc.
      
      ARGUMENT-1 represents ...
      BOOLEAN-ARGUMENT mandates whether ...
      
      See also `another-symbol'."
              
    • Generics should come from a defgeneric form for increased inspectability and reliability of access. Methods can be undocumented, because they are generally just implementations for a documented generic. Method documentation is where exceptional implementation details end up.
    • Give very clear documentation about macros. Include many examples for all common cases in the documentation!

Look at Others

These are some other HDL langauges that I may want to look at.

  • HIRL HDL
  • SAIL2 - Single source of truth for ISA. From formal & functional specification, can be:
    • Mechanically checked
    • Generate human-readable documentation
    • Generate boilerplate for simulators & compilers & HDLs

Interesting Common Lisp Libraries

Many of these libraries are listed here.

  • Clingon - Command Line argument parser
  • cl-readline - Add readline functions to REPL
  • Linedit - Readline-style library
  • BordeauxThreads - Portable threading
  • Sento - Message passing actors, similar to Erlang
  • ACL2 - Logic and programming language for modeling & proving
  • closer-mop - Compatibility layer for using CLOS’s MOP
  • postmodern - Interact with PostgreSQL
  • binary-structures - Make parsing & interacting with binary files/structures easier
  • Parachute - Testing framework that Kandria uses

Compilation Targets

A variety of compilation targets should be available, either for debugging or performance. To make code “steppable” you must have set debug to be greater than speed, space, and compilation-speed. This can be done with the declaim function.

Testing

  1. Property-based testing of pure functions. For example, chil:log2up should be property tested. This way I only have to write the properties required, and not any actual implementation. Possible candidates:
    • check-it (Last update 2015-06-05) Does not play well with lisp-unit2 because lisp-unit2:assert-error does not return a value when an error is thrown.
      ;; This should be a successful test.
      (lisp-unit2:assert-error 'simple-error (chil/utils:log2up -1))
      ; No values
      
      ;; This should be a failing assertion/test.
      (lisp-unit2:assert-error 'simple-error (- 2 1))
      1 (1 bit, #x1, #o1, #b1)
              

      However, check-it expects the lambda predicate to return true or false depending on the result of the value.

    • cl-quickcheck (Last update 2020-05-08) (Seems abandoned.)
    • Write my own in the style of guile-quickcheck or Racket’s Quickcheck. https://stevana.github.io/the_sad_state_of_property-based_testing_libraries.html
  2. Automated generation of test programs for modules requiring simulation. Interesting works in software:
  3. Generated output (Verilog, VHDL) should be checked against simulators for linting. For Verilog, use Verilator & Icarus. For VHDL, use GHDL.
  4. There should be an interpreter/simulator for the top-level language that is used (Host language simulation). See the Simulator Section. This solves the problem where only the emitted language can be verified, and not the host language.
  5. Any unit tests for modules (whether in the standard library or written by the designer) must be synthesizable. Down to the low-level language.
  6. Need the ability to collect host-language coverage information out of tests. The more semantic information available should mean tracking coverage and finding cases where there is no test-case coverage should be eaiser. For example, the higher-level language knows what is an FSM, and should be able to test all possible cases for it. The lower-level generated language may not understand that information and just blindly test.
  7. AFTER EVERYTHING ELSE DONE: EDA tooling for Chil. Design Verification workflows & debug should be able to be performed on Chil, rather than its outputs.

Formal Methods

Hardware is extensively validated and verified with formal methods. Chil should support writing a formal specification of the hardware, which means we need a way to express these kinds of concepts. There are several kinds of formal methods that we should investigate and try to support:

  1. Model & Property Checking (Lightweight formal methods) We can take the core of our randomized property testing from guile-quickcheck? The forge language built on top of Racket might also be a good resource to look at.
  2. Formal Specification & Theorem Proving (Heavyweight formal methods) We might be able to piggy-back of ACL2 for this.

I am not sure we want to support this in Chil directly, because this might be more generally useful. It might make more sense for this to be a separate project that Chil then relies on. It remains to be seen which is better, but initial development will start here I think. If it seems better to factor these formal methods tools out to a separate repository, then we will tackle that problem later.

Many of the concepts discussed in this section come from Jakob Kreuze’s blog post about their expeirence with formal methods in courses.

  • Need the ability to embed arbitrary property assertions, without having to shell out to other languages/tools. For example, temporal assertions (TLA-style) should be native to the language, and not an afterthought requiring inlining another language in the host language.

Higher-level Hardware

  1. Create higher-level versions of chil:module that is less painful to use, but can be converted into low-level Verilog-like format currently being used. Should support an implicit reset & clock, which can be overridden with a (with-reset/clock ...) macro(?).
    • Higher-level version should NOT have Verilog-specific information included in its definition. This includes things like timescale. timescale should be handled at the Verilog level, but needs to be passed through as metadata attached to the higher-level module.
  2. This higher-level hardware should support things like mixins. Chisel has the ability to create a new module that extend-s another, so that the new one inherits that hardware. It also has the ability to use composition, so you can say a signal “bundle” must and will contain these other signals, which have certain methods already defined for them.
    • See Chapter 2.1 (Hooks) of Common Lisp Condition System for underying idea on how to implement mixins similar to Chisel. Should use catch/signal/error/handler-bind for real thing though. See Chapter 2.2 for that.
    • Might want to use restarts instead?
    • Reference the Common Lisp Cookbook
    • Investigate how Kandria did mixins for their simulator. https://github.com/Shinmera/talks/tree/master
  3. One-way enum for FSM Specialization of an enum that only allows you to traverse in one direction. (next oneway-enum signal) moves you to next state when signal goes high. Special-case this because complicated FSMs typically have cycles in their control flow (looping).
  4. The equivalent to Chisel’s Flipped constructor could be a macro that just switches all (inputs ...) to (outputs ...). (defmacro ... `(,module (inputs ,(module-outputs)) (outputs ,(module-inputs)) rest is same?)
  5. Need to provide a way to disable any implicit signals installed (clock, reset, etc.). Implicit clocks make it harder to specify clock domains & gating logic when interfacing with non-Chil hardware. (Perhaps this is obviated by the fact that Chil will read Verilog & add it to the final IR?) Implicit resets make it harder to pipeline reset logic & add balanced flop trees.
  6. Need a way to control naming.

Diplomacy-like System for Delayed Hardware Design

Chisel has a tool called Diplomacy, which is a way to delay hardware generation until parameters are fully known. Some parameters in a hardware design are not known by the programmer at the time they write the HDL. For example, how many address bits do you need in a cross-bar? That depends on the number of devices attached to the cross-bar. What if you want to make the cross-bar implementation a library, to reuse the cross-bar everywhere? How can you get the number of devices without having the whole design?

Diplomacy solves these problems by introducing a new phase before Chisel hardware generation. You (as the designer) mark Chisel modules as “diplomatic” by introducing Diplomacy parameters to the module. Then, when compiling, the Diplomacy framework goes over a design, passing these parameters around to all the diplomatic modules in the design. The parameters are then concretized into the Chisel code before the Chisel compiler is run.

Modules in this setup need to be marked as lazy, so that the Chisel compiler will accept the symbol’s definition as being valid, without having an actual definition yet. (lazy is a lazy evaluation in this case). This lazy marker is required to make sure the compiler does not complain when a module has an implementation that depends on resolved diplomatic parameters.

I wonder what would happen if we flip the script and make everything diplomatic, rather than having to explicitly opt-in. If modules do not need diplomatic parameters, the outer wrapper can be silently unwrapped. With Lisp’s code-staging through symbol recognition (gexps in Guix are just symbols that are a “specially-named quote” in this metaphor), the notion of lazy may not be needed anymore.

Network-on-Chip Extension

This section is taken from Jerry Zhou’s Constellation NoC generator. Can a Diplomacy-like framework in Chil allow for expression of NoCs? Chisel’s Diplomacy cannot do this because Diplomacy can only describe acyclic networks. UC-Berkeley has implemented Constellation’s cyclic descriptions into Diplomacy-generated acyclic ones by providing translators.

Would a general cyclic NoC language be able to express any acyclic interconnect system too? Are there problems there? Can you prove the acyclic interconnect out of a potentially cyclic description and then change tactics (for example, more aggressive optimization)?

Such an expression language must include:

  • A specification language that includes the topology, routing, protocol, and coherence.
    • Logical specification: Flows & endpoints. How many nodes (endpoints) are there? How are they logically connected? What are the logical flows the NoC must handle? What are the conditions for deadlock-free execution (conditions to always make forward progress) in the NoC? As part of the flow specification, we can limit what design points we generate HW for, because not all flows are possible given allthe other constraints in the specification.
    • Physical specification: Topology, microarch, and channels. What are the physical properties of this network? How wide is a channel? What is the topology of nodes in the network? What is the specific implementation details of the nodes? How many buffer entries are in the network?
    • Routing specification: Routing policy, allocation, and arbitration. How do packets/flits reach one end of the network from the other? What resources are allocated as a packet/flit traverse the network? What is the arbitration scheme to determine what resources get allocated? “Marries logical spec to physical” (Zhou, 2023). The routing table will be generated for each router node:
      1. Compute all possible paths for all possiblef lows.
      2. For each router, compute precisely which flows might arrive.
      3. Construct an abstract truth table for routing.
        1. Input is flow, currently occupied Virtual Channel
        2. Output is a Boolean for each output Virtual Channel
      4. Use logic minimization to generate HW implementation of routing table. Espesso will often be better here because the routing table is likely to be quite large and exact minimization algorithms (Quine-McCluskey) will take inordinate amounts of time.
  • A specification translator that can generate behavioral and transactional simulators. These will be used to verify correctness of implementations of this specification.
  • A language for implementing the behaviors of the network itself.
  • Multi-protocol networks, where multiple protocols either interface through endpoints/adapters, or work on the exact same physical specification.
  • Multi-network systems:
    • Separate performance-critical traffic from control traffic. The performance network can be high-bandwidth, high-power, and low-latency, while control can be lower-bandwidth.

This NoC framework must validate (and preferably prove):

  • The network is actually routable.
  • There is no deadlock in the protocol’s specification
  • There is no deadlock in the protocol’s implementation.

Basic notes about NoCs:

  • Packets are used
  • Packets may be bigger than what the network can actually transmit. In this case, packets are further decomposed into flits. There is a header/tailer flit to encode the start/end of a packet stream.
  • Wormhole routing is a fairly standard way to implement a routing policy. In this case, flits move through the network, one at a time. The header flit starts the process and subsequent flits exactly trail the header as it moves through the network. This makes the sequence of flits look like a worm moving through the network. Such a routing policy means wormhole routing is just a resource-allocation policy.

All of this can be done with normal Lisp code, without needing to drop to Chil, because no hardware has been generated yet. Only once the spec and its implementation have been shown to not cause problems is hardware actually generated.

Building/Elaborating

For any realistic Chil project, a build system will be needed to automate the work of taking a Chil description and lowering it to another format. Look through Build Systems à la Carte for more information about this topic.

Implementing this could be done just by piggy-backing off of Common Lisp’s already-present asdf. Then for larger scale automation, some utilities may be provided.

Veryl is very similar to Verilog, with minor conveniences added to it. Its real draw is that it has a set of integrated tools that help manage your project, with commands similar to Rust’s cargo tool.

There should be a define-able style guide which can be enforced by a linter. An example of a Verilog Style Guide.

Something that SBT does that I think is really nice is that you can add a ~~~ to any sbt command, and it will “watch” the dependencies. This means that if you update a dependency for the command, the command is automatically re-run. For example, after saving edits to a file, the unit tests for that file run again automatically, with the necessary builds done in between.

Montana offered to use a database behind-the-scenes to manage compilation, which allowed tool-writers to hook into the compilation flow itself. This provided features similar to LSPs and high-quality IR semantic analyzers today, before those were widely available for languages like C++.

Scala’s Mill is kind of what I am aiming for.

Notes after reading Build Systems à la Carte

We want a suspending scheduler for the build system, where each thread/process building the project can be paused until its inputs are ready. But given Common Lisp’s restart system, a restarting scheduler could be far more feasible. Another problem for suspending scheduler is that Common Lisp does not have good support for continuation-passing style?

Comparison to Chisel

Chisel uses the Scala Build System (SBT) to define and declare projects, and uses Java’s default file hierarchy to find files. But SBT does not work for projects that need to leave the Scala world? Hence, larger projects like Chipyard need a combination of scripts, Makefiles, and Scala-generated Makefiles to make everything happen.

Chisel, Chipyard, Rocket, etc. all moved to using Mill instead of SBT.

Annotations

My thoughts about Annotations and Hardware Construction Languages and how they can be used in Chil:

  • Annotations should not be an after-thought.
  • They are a key way to pass circuit metadata down through the compiler’s phases.
  • Should annotations be allowed in the circuit description itself? Or in another file altogether?
  • Annotations indirectly refer to parts of the circuit. Just use the name, rather than a pointer or another structure. This naming indirection allows passes to rename components in the actual circuit without needing to do massive cross-cutting modifications.

Non-Compilation Passes

In addition to lowering passes needed to compile a high-level circuit construction to the final circuit, we also need to provide passes that do not alter the circuit. These passes can provide information or feedback about your circuit at points in its life. The Nanopass framework supports this with transforms that take a language in and do not produce an output language.

Some ideas for these passes include:

  • FIRRTL Pass for Area and Timing
  • Generating target-device-specific configuration files. For example, an accelerator may need an XML file to describe the hardware that is being added. A pass could take in the IR, figure out what is being asked, and return an XML file describing the written circuit.

Documentation

Language documentation should be clear and easy to read. When possible, it should be concise, but should not limit itself when deeper explanation is necessary. The entire public-facing interface for the language should be documented, and hopefully all the internals too.

The list below is taken from the blog post Why is language documentation still so terrible?:

  • A canonical language documentation written for real human beings
  • Docs themselves should be versioned, so you do not have to sift through information that doesn’t apply to the version you care about
  • A reference/appendix section that contains the language specification (syntax, operator precedence, keywords, etc.)
  • An individual page for each standard library class or built in type
    • Class and method descriptions should answer at least the first 2, but preferably all 3 of the following questions:
      1. What does this do (effect)?
      2. How does it do it (internal implementation)?
      3. Why would I want it to (use-case, comparison to similar methods, etc.)?
    • Link directly to the source code of the internal implementation.
    • That page must be as uncluttered as possible
    • That page must contain (not link to) every method, and the descriptions of those methods, that can be called by that class, preferably including all inherited functions.
      • Most methods should have at least 1 example
      • There should be a sidebar or equivalent that contains all the method names in alphabetical order for easy searching and jumping
    • Code examples should be at least lightly syntax highlighted
    • examples, descriptions, and function signatures should link internally as much as possible
    • non-cryptic names, or at least like… tell me what your 8 byte contraction expands to
  • Preferably on a publicly accessible website, styled in a way that doesn’t make my eyes bleed (dark mode option), and that responds appropriately to at least both full screen (16:9) and half screen (8:9) sizes
  • A search function that isn’t just lmgtfy?????? Are we for real???

The language documentation the author believes satisfied all of these criteria was Rust’s standard library documentation system. The author further pointed out that even 3rd party crates get a similar documentation website generated for them, just by using the doc-comments in the files, and publicly-exported tools.

Toolchain Driver

If I intend to support multiple input formats and output formats, there will need to be a series of steps to define actions to take to produce an output. This may involve running the Chil compiler, but it might also involve running other tools (like a script to convert a JSON description of memory into a dat format). If I also want to have a “workflow” kind of language so that I can provide a design and the desired end target, then I would need this too. Effectively, this would become the unified way to work with anything in my Chil language.

  • fud2 - A Compiler driver for orchestrating the Calyx ecosystem. It handles building a design (including lowering from Dahlia, their HLS language) and turning it into SystemVerilog, which is then merged with their SystemVerilog standard library. It can interpret the Calyx using their interpreter, Cider. It can also take the final SystemVerilog and run it through Verilator, Icarus, or even FPGA workflows for synthesis. Currently (2024-08-16), fud2 uses a breadth-first search to find a path in the graph of operations from the input to the requested output. However, they are also investigating other methods, like using E-Graphs (Equivalence Graphs) through egglog, or constraint programming through Datalog.

Common Lisp has an implementation of Datalog as a DSL on GitHub called cl-datalog. Datalog was originally implemented in Clojure, with this Overview of Datalog?

Optimization

Within Chil, I would like to have an optimization framework for the higher-level language. I am not sure how much optimization is possible in the long-run. But for the small actively-working capacity of my mind, the Nanopass Framework makes the most sense to me.

  1. I might have to implement the Nanopass Framework in ANSI Common Lisp…
    • If I did that, I might be able to get that upstreamed?

Pass Ideas

Nanopass uses very small passes that do relatively little work. They rewrite, modify, or analyze a very small subset of an AST to do something. One example is to convert instances of let* in Scheme to a ladder of let and lambda.

Some ideas for passes that I could write are:

  • CheckWidths: FIRRTL has a pass to check if dynamic shifting uses a dynamic shift amount that has a bit-width $> 20$. This is the firrtl.passes.CheckWidths pass, particularly the $DshlTooBig top-level function.

Outputs

Generate other low-level HDLs.

  1. FIRRTL?
  2. CIRCT?
  3. VHDL
  4. SystemVerilog

Simulator

Chil should include a simulator alongside it. Requirements:

  • Should be multi-threaded, to improve execution speed, if possible.
  • If a “core” assertion in the simulation testbench fails, then a Lisp core image should be saved (sb-ext:save-lisp-and-die).
  • This core image should allow for “rewinding” the world to see the sequence of events that caused an assertion violation.
  • We should support both 2-state and 4-state simulation. This helps reveal initialization errors that propagate through the circuit. As a reminder, 2-state only allows 0 and 1, with nets initialized to 0; 4-state allows 0, 1, X (unknown), Z (competing drivers, floating, high-impedance).

Methods to achieve requirements:

  1. Simulator should use transactional memory?
  2. Simulator must record the state changes in the circuit to a DB for rewind? Does the transactional memory allow that too? If the transaction log of memory allows for recording to disk, then replay should be somewhat trivial.
    • Jason recommended RRDTool as a time-series database. If a database is needed, that might make more sense.
  3. Propagators?

Konata is a tool to interactively view how instructions flow through a pipeline. It also supports Out-of-order execution information.

Konata uses a log format called Kanata. The log file is a text file format whose format is described here.

Verification

  1. Proof-Carrying Code
  2. Compare/contrast with SymbiYosis, Yosys’s front-end to formal HW verification flows

Synthesis

There are three main parts to synthesizing a design from HDL down to actual circuits. There are actually many sub-portions to each of these tasks, but these highlight the major steps when lowering an HDL to circuits.

  1. Logical Synthesis (Synthesis in Vivado’s terms) Turns your HDL into a technology-independent netlist. Many optimizations are done at this level, because the most information is available now. This can be used to do very rough timing analysis, analyze potential critical paths, and most importantly, see what your HDL actually synthesizes into.
  2. Technology Mapping/Library Binding This is like instruction selection in compilers. You must figure out and optimize the set of gates that the manufacturer has implemented for that technology for what you synthesized into. For example, an AOI3 can have a special circuit mapping.
  3. Physical Synthesis (Implementation in Vivado’s terms) This takes the logical description of physical components and maps them onto the actual hardware. This involves layout compaction, partitioning, floorplanning, placement, and routing.

Vivado Synthesis Steps

The information for this section is taken from: AMD’s Vivado Synthesis User Guide (UG901), AMD’s Vivado Implementation User Guide (UG904), and this Vivado Synthesis question & response. You can look at AMD’s Vivado Design Suite User & Reference Guides (UG949) to get a top-level view of all user-guides.

  1. Synthesis (Logical Synthesis)
    1. Elaborates the design, resolving parameters, generate blocks, and other high-level RTL details. At the end of this, there is an instantiated module and connection for everything. Vivado’s output from this are “Generic Technology Cells”. GTCs are abstract items, like addres, comparators, registers, arbitrarily wide gates, infinite fan-out, etc. This is an abstract netlist.
    2. Apply constraints. These constraints are specified in the XDC format, Xilinx’s extension to the standard SDC format. XDC = Xilinx Design Constraints, SDC = Synopsys Design Constraints.
    3. Perform high-level optimizations. These optimizations take advantage of the constraints that we placed on the netlist. They can condense multi-level combinational logic, add abstract buffers for timing, and anything else that does not rely on implementation specific information. In particular, the following optimizations cannot happen yet:
      • Implementation device selection (mapping an abstract adder to a DSP slice for instance.)
      • Implementation timing latencies (BRAM vs. LUT for large logic storage)
      • Implementation power profiles (BRAM vs. LUT for large logic storage)
    4. Perform technology mapping. Vivado needs to know what you are targeting, and attempts to map multiple levels of logic to components on the physical device. At this point, the device’s features are the limiting factor; routing, power consumption, and latency/timing do not play a major factor here.
    5. Perform lower-level optimizations to logic design. Optimizations at this point can take advantage of the fact that particular portions of the circuit have been mapped to specific pieces of the device.
  2. Implementation (Physical Synthesis)
    1. Opt Design: Optimizes the logical design to make it easier to fit onto the target AMD device.
    2. Power Opt Design (Optional): Optimize physical design to reduce power demands
    3. Place Design: Place the abstract physical design onto the target device. Fan-out replication is performed here.
    4. Post-place Power Optimization (Optional): Use placement knowledge to reduce power.
    5. Post-place Physical Optimization Design (Optional): Use placement knowledge to improve timing.
    6. Route Design: Route the design on the target device.
    7. Post-Route Physical Optimization (Optional): Optimize the design using the placement and routing knowledge. This optimization step can take advantage of the highly-accurate and device-specific timing information present on the final device.
    8. Write Bitstream: Generate the design bitstream for flashing.

Examples

  1. Simple counter
  2. ALU
  3. Single-Error Correct, Double-Error Detect ECC Unit
  4. N-point FFT
  5. Cryptographic cores/accelerators
    1. AES-256
    2. SHA-256
  6. IEEE 754 compliant Floaing-point unit (Similar to Berkeley’s hardfloat)
    1. Addition
    2. Subtraction
    3. Multiplication
    4. Division
    5. Pipelined
  7. Communications protocol (AXII, AHB-to-APB bridge)
    1. AXI4 Implementation for AXI4, AXI4 Lite, and AXI4 Stream.
  8. RISC-V core (Should support RISC-V GC, to boot Linux) Getting many of these built will make my stuff equivalent to Berkeley’s RISC-V SODOR.
    1. Hardware support for single-, double-, and quad-precision floating point. See Berkeley’s HardFloat.
    2. Single-cycle
    3. Multi-cycle
    4. Pipelined (single issue)
      1. mrisc32
    5. Multi-issue in-order pipelined
    6. Single-issue out-of-order
    7. Multi-issue out-of-order
  9. tiny-gpu: A minimal GPU that executes a single kernel at a time with many threads per core. This architecture also includes a small amount of possible configuration too.
  10. turbo9: Pipelined Motorola 6809 design
  11. Caster: Electrophoretics Display (eInk) Controller. Used by Glider.
  12. CHERI in Hardware This has already been done with ARM, MIPS, and recently RISC-V. But I want to implement on this.
  13. Custom architecture