- Introduction
- 1. Day 1 - Introduction to Verilog RTL design and Synthesis
- 2. Day 2 - Timing libs, hierarchical vs flat synthesis and efficient flop coding styles
- 2.1. Introduction to Timing .libs
- 2.2. Hierarchial synthesis vs Flat synthesis
- 2.3. Various Flop Coding Styles and optimization
- 3. Day 3 - Combinational and Sequential Optimizations
- 4. Day 4 - GLS, blocking vs non-blocking and Synthesis-Simulation mismatch
- 5. Day 5 - Optimization in synthesis
The report is based on 5-day RTL Design and Synthesis in Verilog using the SKY130 Technolog workshop facilitated by VSD on using open source tools involving iVerilog, GTKWave, Yosys with Sky130 technology.
This workshop introduces to the digital logic design using Verilog HDL. Validating the functionality of the design using Functional Simulation. Writing Test Benches to validate the functionality of the RTL design .Logic synthesis of the Functional RTL Code. Gate Level Simulation of the Synthesized Netlist.
SKY130 is the hardware industry's first open-source process design kit (PDK) released by SkyWater Technology Foundry in collaboration with Google giving all hardware design experts and aficionados, a worldwide access to their IP functions and open source ASICs.
In the digital circuit design, register-transfer level (RTL) is a design abstraction which models a synchronous digital circuit in terms of the data flow between hardware register, and the logical operations performed on those signals. RTL abstraction is used in HDL to create high-level representations of a circuit, from which lower-level representations and ultimately actual wiring can be derived.
Simulator: It is a tool which is used for checking the design. In this workshop we are using iverilog tool.Simulation is the process of creating models that mimic the behavior of the device you are designing (simulation models) and creating models to exercise the device (test benches). RTL Design: It consists of an actual verilog code / a set of verilog codes that have the functionality to meet the required design specifications of the circuit.
Test Bench: It is the setup to apply stimulus(test vectors) to design to checks its functionality.
Simulator looks for changes on input signals and based on that output is evaluated.
Design may have 1 or more primary inputs and primary outputs but TB doesn't have.)
Simulator continuously checks for changes in the input. If there is an input change, the output is evaluated; else the simulator will never evaluate the output.
//create a directory
$ mkdir VLSI
//Git Clone vsdflow.
$ git clone https://github.com/kunalg123/vsdflow.git
//Git Clone sky130RTLDesignAndSynthesisWorkshop.
$ git clone https://github.com/kunalg123/sky130RTLDesignAndSynthesisWorkshop.git
sky130RTLDesignAndSynthesisWorkshop Directory has: My_Lib - which contains all the necessary library files; where lib has the standard cell libraries to be used in synthesis and verilog_model with all standard cell verilog models for the standard cells present in the lib. Ther verilog_files folder contains all the experiments for lab sessions including both verilog code and test bench codes.
We are given a default set of files and libraries shown below to work on using the practical lab instance.
$ gvim tb_good_mux.v -o good_mux.v
Synthesizer is a tool for converting the RTL to Netlist and here we are using the Yosys Synthesizer.
RTL Design - behavioral representation in HDL form for the required specification.
Synthesis - RTL to Gate level translation. The design is converted int gates and connections are made. This given outas a file called netlist.
.lib file is a collection of logical modules which includes all basic logic gates. It may also contain different flavors of the same gate (2 input AND, 3 input AND – slow, medium and fast version).
A cell delay in the digital logic circuit depends on the load of the circuit which here is Capacitance.
Faster the charging / discharging of the capacitance --> Lesser is the Cell Delay
Inorder to charge/discharge the capacitance faster, we use wider transistors that can source more current. This will help us reduce the cell delay but at the same time, wider transistors consumer more power and area. Similarly, using narrower transistors help in reduced area and power but the circuit will have a higher cell delay. Hence, we have to compromise on area and power if we are to design a circuit with low cell delay.
A Constraint is a guidance file given to a synthesizer inorder to enable an optimum implementation of the logic circuit by selecting the appropriate flavour of cells (fast or slow).
__Command to open the libary file
$ gvim ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
__To shut off the background colors/ syntax off:
: syn off
__To enable the line numbers
: se nu
For a design to work, there are three important parameters that determines how the Silicon works: Process (Variations due to Fabrications), Voltage (Changes in the behavior of the circuit) and Temperature (Sensitivity of semiconductors). Libraries are characterized to model these variations.
_Opening the file used for this experiment
$ gvim multiple_modules.v
_Invoke Yosys
$ yosys
_Read library
$ read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
_Read Design
$ read_verilog multiple_modules.v
_Synthesize Design
$ synth -top multiple_modules
_Generate Netlist
$ abc -liberty ../my_lib/lib/sky130_fd_sc_hd__t_025C_1v80.lib
_Realizing Graphical Version of Logic for multiple modules
$ show multiple_modules
_Writing the netlist in a crisp manner
$ write_verilog -noattr multiple_modules_hier.v
$ !gvim multiple_modules_hier.v
Multiple Modules: - 2 SubModules Staistics of Multiple Modules
Realization of the Logic
Netlist file
_To flatten the netlist
$ flatten
_Writing the netlist in a crisp manner and to view it
$ write_verilog -noattr multiple_modules_flat.v
$ !gvim multiple_modules_flat.v
Realization of the Logic
Netlist file
Sub-module level synthesis is preferred when there are multiple instances of same module. Sythesizing the same module over several times may not be advantageous with respect to time. Instead, synthsis can be performed for one module, its netlist can be replicated and then stitched together in the top module. This is also used particulary in massive designs using divide and conquer method.
Statistics of Sub-module
Graphical Realization of the Logic
NetList File of Sub-module
In a digital design, when an input signal changes state, the output changes after a propogation delay. All logic gates add some delay to singals. These delays cause expected and unwanted transitions in the output, called as Glitches where the output value is momentarily different from the expected value. An increased delay in one path can cause glitch when those signals are combined at the output gate. In short, more combinational circuits lead to more glitchy outputs that will not settle down with the output value.
A D flip-flop is a sequential element that follows the input pin d at the clock's given edge. D flip-flop is a fundamental component in digital logic circuits. There are two types of D Flip-Flops being implemented: Rising-Edge D Flip Flop and Falling-Edge D Flip Flop.
Every flop element needs an initial state, else the combinational circuit will evaluate to a garbage value. In order to achieve this, there are control pins in the flop namely: Set and Reset which can either be Synchronous or Asynchronous.
_ Here, always block gets evaluated when there is a change in the clock or change in the set/reset.The circuit is sensitive to positive edge of the clock. Upon the signal going low/high depending on reset or set control, singal q line goes changes respectively. Hence, it does not wait for the positive edge of the clock and happens irrespective of the clock_.
#Steps Followed for analysing Asynchronous behavior:
//Load the design in iVerilog by giving the verilog and testbench file names
$ iverilog dff_asyncres.v tb_dff_asyncres.v
//List so as to ensure that it has been added to the simulator
$ ls
//To dump the VCD file
$ ./a.out
//To load the VCD file in GTKwaveform
$ gtkwave tb_dff_asyncres.vcd
GTK WAVE OF ASYNCHRONOUS RESET
_Invoke Yosys
$ yosys
_Read library
$ read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
_Read Design
$ read_verilog dff_asyncres.v
_Synthesize Design - this controls which module to synthesize
$ synth -top dff_asyncres
_There will be a separate flop library under a standard library
_But here we point back to the same library and tool looks only for DFF instead of all cells
$ dfflibmap -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
_Generate Netlist
$ abc -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
_Realizing Graphical Version of Logic for single modules
$ show
_Writing the netlist in a crisp manner
$ write_verilog -noattr dff_asyncres_ff.v
$ !gvim dff_asyncres_ff.v
Statistics of D FLipflop with Asynchronous Reset
Realization of Logic
Statistics of D FLipflop with Asynchronous set
Realization of Logic
Statistics of D FLipflop with Synchronous Reset
Realization of Logic
modules used are opened using the command
$ gvim mult_*.v -o
_Invoke Yosys
$ yosys
_Read library
$ read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
_Read Design
$ read_verilog mult_2.v
_Synthesize Design - this controls which module to synthesize
$ synth -top mul2
_Generate Netlist
$ abc -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
_Realizing Graphical Version of Logic for single modules
$ show
_Writing the netlist in a crisp manner
$ write_verilog -noattr mult_2.v
$ !gvim mult_2.v
Expected Logic
Statistics & abc command return due to absence of standard cell library
No hardware requirements - No # of memories, memory bites, processes and cells. Number of cells inferred is 0.
NetList File of Sub-module
Realization of Logic
Expected Logic
_Statistics _
NetList File of Sub-module
Realization of Logic
Logic Circuits Combinational circuits are defined as the time independent circuits which do not depends upon previous inputs to generate any output are termed as combinational circuits. Sequential circuits are those which are dependent on clock cycles and depends on present as well as past inputs to generate any output.
Why do we need Combinational Logic Optimizations?
- Primarily to squeeze the logic to get the most optimized design.
- An optimized design results in comprehensive Area and Power saving.
- Constant Propagation
- Direct Optimization technique
- Boolean Logic Optimization.
- Karnaugh map
- Quine Mckluskey
In Constant propagation techniques, inputs that are no way related or affecting the changes in the output are ignored/optimized to simplify the combination logic thereby saving area and power usage by those input pins.
Y =((AB)+ C)'
If A = 0
Y =((0)+ C)' = C'
Boolean logic optimization is nothing simplifying a complex boolean expression into a simplified expression by utilizing the laws of boolean logic algebra.
assign y = a?(b?c:(c?a:0)):(!c)
above is simplified as
y = a'c' + a(bc + b'ca)
y = a'c' + abc + ab'c
y = a'c' + ac(b+b')
y = a'c' + ac
y = a xor c
- Basic Technique
- Sequential Constant Propagation
- Advanced Technique
- State Optimization
- Retiming
- Sequential Logic cloning(Floorplan aware synthesis)
//to view all optimization files
$ ls *opt_check*
//Invoke Yosys
$ yosys
//Read library
$ read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
//Read Design
$ read_verilog opt_check.v
//Synthesize Design - this controls which module to synthesize
$ synth -top opt_check
//To perform constant propogation optimization
$ opt_clean -purge
//Generate Netlist
$ abc -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
//Realizing Graphical Version of Logic for single modules
$ show
Expected logic from verilog file
value of y depends on a, y = ab.
Command for constant propogation method
Realization of the Logic
optimized graphical realization thus shows a 2-input AND gate being implemented.
Expected logic from verilog file
value of y depends on a, y = a+b.
Realization of the Logic
optimized graphical realization thus shows 2-input OR gate being implemented. Although OR gate can be realized using NOR, it can lead to having stacked PMOS configuration which is not a design recommendation. So the OR gate is realized using NAND and NOT gates (which has stacked NMOS configuration).
Expected logic from verilog file
value of y depends on a, y = abc.
optimized graphical realization thus shows 3-input AND gate being implemented.
Expected logic from verilog file
The value of y depends on a, y = a'c + ac
Realization of the Logic
optimized graphical realization thus shows A XNOR C gate being implemented.
//To view all optimization files
$ ls *df*const*
//To open multiple files
$ dff_const1.v -o dff_const2.v
//Performing Simulation
//Load the design in iVerilog by giving the verilog and testbench file names
$ iverilog dff_const1.v tb_dff_const1.v
//To dump the VCD file
$ ./a.out
//To load the VCD file in GTKwaveform
$ gtkwave tb_dff_const1.vcd
//Performing Synthesis
//Invoke Yosys
$ yosys
//Read library
$ read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd_-tt_025C_1v80.lib
//Read Design
$ read_verilog dff_const1.v
//Synthesize Design - this controls which module to synthesize
$ synth -top dff_const1
//There will be a separate flop library under a standard library
//so we need to tell the design where to specifically pick up the DFF
//But here we point back to the same library and tool looks only for DFF instead of all cells
$ dfflibmap -liberty ../my_lib/lib/sky130_fd_sc_hd_-tt_025C_1v80.lib
//Generate Netlist
$ abc -liberty ../my_lib/lib/sky130_fd_sc_hd_-tt_025C_1v80.lib
//Realizing Graphical Version of Logic for single modules
$ show
Expected logic from verilog file
GTK Wave
Statistics showing a flop inferred
Realization of Logic
The optimized graphical realization thus shows the flop inferred. Also, the design code has active high reset and the standard cell library has active low reset - so, there is a presence of inverter for the reset.
Expected logic from verilog file
GTK Wave
Statistics showing a flop inferred
Realization of Logic
Expected logic from verilog file
GTK Wave
Statistics showing a flop inferred
Realization of Logic
Expected logic from verilog file
GTK Wave
Statistics showing a flop inferred
Realization of Logic
Expected logic from verilog file
GTK Wave
Statistics showing a flop inferred
Realization of Logic
//Steps Followed for each of the unused output optimization problems:
//opening the file
$ gvim counter_opt.v
//Invoke Yosys
$ yosys
//Read library
$ read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
//Read Design
$ read_verilog opt_check.v
//Synthesize Design - this controls which module to synthesize
$ synth -top opt_check
//To perform constant propogation optimization
$ opt_clean -purge
//Generate Netlist
$ abc -liberty ../my_lib/lib/sky130_fd_sc_hd_-tt_025C_1v80.lib
//Realizing Graphical Version of Logic for single modules
$ show
Expected logic from verilog file
If there is a reset, the counter is intialised to 0, else it is incremented - performing like an upcounter. Since it is a 3 bit signal, the counter rolls back after 7. However, the final output q is sensing only the count [0], so the bit is toggling in every clock cycle (000, 001, 010 ...111). The other two outputs are unused and does not create any output dependency. Hence, these unused outpus need not be present in the design.
Statistics showing only one flop inferred instead of 3 flops sinces it is a 3 bit counter
Realization of Logic
optimized graphical realization output Q (count0) being fed to NOT gate so as to perform the toggle function. The other outputs which has no dependency on the primary out is optimized off.
//Steps Followed:
//Copying the code to a new file
$ cp counter_opt.v counter_opt2.v
$ gvim counter_opt2.v
//Changes made in the verilog code, i for insert mode:
- assign q = [count2:0] == 3'b100;
Expected logic from verilog file
In this case, all three bits of the counter is used and hence 3 flops are expected in the optimized netlist.
Statistics showing all three flops inferred
Realization of Logic
All three flops can be seen. There is a need for incremental logic, so the logic other than flops represent the adder circuit. The expression at the output is q = counter2.counter1'.counter0'. Therefore, the outputs having no direct role on the primary output will only be optimized away.
What is Gate Level Simulation (GLS) ?
Running the testbench against the synthesized netlist ouput as a DUT is known as Gate Level Simulation (GLS). The Output netlist should logically be same as the RTL code so that the testbench will align itself when we simulate both the files to obtain the waveforms.
Advantages of GLS:
*To logically verify the correctness of the design after Synthesis. *During the RTL Simulation, timing was not accounted. But for practical applications, there is a need to ensure the timing of the design to be met.
Why GLS?
GLS is required to verify the logical correctness of the design post synthesis with the help of the netlist file. It ensures whether the timing of the design is met and for thi, the GLS used to run with delay annotations.
//consider a netlist
and uand (.a(a),.b(b))
or uor (.a(a),.b(b))
//There is a need to define the meaning of and and or
//Thus we need netlist, testbench and verilog models of the standard cells
Netlist consists of all standard cells instantiated and it's meaning is conveyed to the iVerilog using Gate Level Verilog Models. Gate Level Verilog Models can be functional or timing aware. If the gate level models are delay annotated, then GLS can be performed for timing validation also in addition to functional validation.
If netlist is a true reciprocation of RTL, what is the need to validate the functionality of netlist? There may be synthesis and simulation mismatch due to the following reasons:
(I)Missing Sensitivity List (II)Blocking Vs Non Blocking Assignments (III)Non Standard Verilog Coding
(I)Missing Sensitivity List
module mux(
input i0,input i1
input sel,
output reg y
);
always @ (sel)
begin
if (sel)
y = i1;
else
y = i0;
end
endmodule
The output of Simulator changes only when the input changes. The output is not evaluated when there is no activity. In the above 2x1 mux code, when select is changing (when select is 1), the output is 1 when input is 1 else the output is 0. The always block evaluates only when there is a transition change in select pin, and is not sensitive (output does not reflect) to changes in the inputs 0 and 1.
Corrected code for missing sensitivity list:
module mux(
input i0,input i1
input sel,
output reg y
);
always @ (*)
begin
if (sel)
y = i1;
else
y = i0;
end
endmodule
mismatch is corrected by having always @ (*) where the always block is evaluated when any signal changes. So, any changes in inputs will also be seen in the output.
Blocking and Non-blocking statements are procedural assignment statements that can be implemented only inside an always block.
*Blocking Assignments --> = *Executes the statements in the order in which they are coded.
*Non-blocking Assignments --> <= *Executes the RHS of all such assignments when the always block is entered and assigned to LHS in a parallel evaluation.
Synthesis-Simulation mismatches due to incorrect ordering of the blocking assignments done inside an always block.
module code (input clk,input reset,
input d,
output reg q);
always @ (posedge clk,posedge reset)
begin
if(reset)
begin
q0 = 1'b0;
q = 1'b0;
end
else
q = q0;
q0 = d;
end
endmodule
The assignments inside the code represent the blocking statements. q0 and q are assigned to 1 bit 0s - so asynchronous reset connection happens. However, in the later parts, q0 is assigned to q and then d gets assigned to q0. If suppose, there is a change in the code.
module code (input clk,input reset,
input d,
output reg q);
always @ (posedge clk,posedge reset)
begin
if(reset)
begin
q0 = 1'b0;
q = 1'b0;
end
else
q0 = d;
q = q0;
end
endmodule
In this case, d is assigned to q0 and then q0 is assigned to q. So, by the time the second statment gets executed, q0 has the value of d. This will lead to implementation of only one flop. Previously, q has the value of q0 and q0 has the value of d - which lead to implementation of 2 storage elements.
module code (input a,b,c
output reg y);
reg q0;
always @ (*)
begin
y = q0 & c;
q0 = a|b ;
end
endmodule
The code is aimed to create a function of y = (A+B).C. In the above code, when the code enters always block, due to the presence of blocking statements, they get evaulated in order. So y gets evaluated first (q0.C), where the q0 results corresponds to the previous iteration's result. The q0 value gets updated only in the second statement.
When the order of the statements is changed: In this case, a OR b is evaluated first and the latest value is used for calculating y.
module code (input a,b,c
output reg y);
reg q0;
always @ (*)
begin
q0 = a|b ;
y = q0 & c;
end
endmodule
> Therefore there is a paramount importance to run the GLS on the netlist and match the specifications, to ensure there is no simulation synthesis mismatch.
_Note: Mux function is written using a ternary operator. Ternary operator takes 3 operands with the format.
<Condition>?<True>:<False>
Verilog File
GTK Wave
Statistics
Realization of Logic
NAND gate with i1 and sel, inverted io and Or to And invert gate, to which the inputs are sel and inverted i0. The output y is given by the expression = sel'.i0 + sel.i1
GLS OUTPUT
Verilog File
during Simulation, the logic acts as a latch and during synthesis, it acts as a mux.
GTK Wave
Synthesis Statistics
GLS Output
Realization of Logic
Confirms the functionality of 2x1 mux after synthesis where when the select is low, activity of input 0 is reflected on y. Similarly, when the select is hight, activity of input 1 is reflected on y. Hence there is a synthesis simulation mismatch due to missing sensitivity list.
Verilog File
when the code enters always block, due to the presence of blocking statements, they get evaulated in order. So d gets evaluated first (x.c), where the x results corresponds to the previous iteration's result (a|b). The d value gets updated only in the second statement. The output expression is given as d = (a+b).c
Synthesis Statistics
GTK Wave
d = (a+b).c, if the inputs a,b = 0; then a+b = 0. The output d = 0. But, we observe the output d = 1 because it looks at the past value where a+b was 1.
GLS Output
Realization of Logic
<value of output d is 0 after simulation and 1 after synthesis for the same set of input values. Hence there is a synthesis simulation mismatch due to blocking assignments.
The if statement is a conditional statement which uses boolean conditions to determine which blocks of verilog code to execute. If always translates into Multiplexer. It is used for priority Logic and are always used inside always block.The variable should be assigned as a register.
Syntax for IF Statement
if<cond>
begin
.....
.....
end
else
begin
.....
.....
end
Syntax for IF- ELSE-IF Statement
if<cond1>
begin
.....
executes cb1
.....
end
else if<cond2>
begin
.....
executes cb2
.....
end
else if<cond3>
begin
.....
executes cb3
.....
end
else
begin
.....
executes cb4
.....
end
Hardware Implementation
Cautions with using IF Statements Inferred latches can serve as a 'warning sign' that the logic design might not be implemented as intended. They represent a bad coding style, which happens because of incomplete if statements/crucial statements missing in the design. For ex: if a else statement is missing in the logic code, the hardware has not been informed on the decision, and hence it will latch and will tried retain the value. This type of design should be completely avoided unless intended to meet the design functionality (ex: Counter Design).
The hardware implementation is a Multiplexer. Similar to IF Statements, Case statements are also used inside always block and the variable should be a register variable.
Case Statements
_reg y
always @ (*)
begin
case(sel)
2'b00:begin
....
end
2'b01:begin
....
end
.
.
.
endcase
end
Caveats in CASE Statements
*Case statements are dangerous when there is an incomplete Case Statement Structure may lead to inferred latches. To avoid inferred latches, code Case with default conditions.
reg y
always @ (*)
begin
case(sel)
2'b00:begin
....
end
2'b01:begin
....
end
.
.
default:begin
....
end
endcase
end
*Partial Assignments in Case statements - not specifying the values. This will also create inferred latches. To avoid inferred latches, assign all the inputs in all the segments of the case statement.
Verilog File
GTK Wave
Else case is missing so there will be a D latch.
synthesized design has a D Latch inferred due to incomplete if structure (missing else statement).
Verilog File
GTK Wave
When i0 is high, the output follows i1. When i0 is low, the output latches to a constant value (when both i0 and i2 are 0). Presence of inferred latches due to incomplete if structure.
Synthesis Statistics
Realization of Logic
Verilog File
When select signal is 00, the output follows i0 and is i1 when the select value is 01. Since the output is undefined for 10 and 11 values, the ouput latches to the previously available value.
Synthesis Statistics
Realization of Logic
The synthesized design has a D Latch inferred due to incomplete case structure (missing output definition for 2 of the select statements).
Verilog File
Synthesis Statistics
GTK Wave
When select signal is 00, the output follows i0 and is i1 when the select value is 01. Since the output is undefined for 10 and 11 values, the presence of default sets the output to i2 when the select line is 10 or 11. The ouput will not latch and be a proper combinational circuit. Realization of Logic
Verilog File
Synthesis Statistics
Realization of Logic
Understanding the Usage of For and Generate Statements:
FOR STATEMENTS | GENERATE STATEMENTS |
---|---|
These statements are used inside the always block | These statements are used outsde the always block |
Used for evaluating expressions | Used for instantiating/replicating Hardwares |
Verilog File
GTK Wave
Verilog File
GTK Wave
Synthesis Statistics
Realization of Logic
GLS Output
Verilog File
GTK Wave
Synthesis Statistics
Realization of Logic
GLS Output
Experiment on Ripple Carry Adder
Instantiating the full adder in a loop to replicate the hardware Verilog File
GTK Wave
Synthesis Statistics
Realization of Logic - rca
Realization of Logic- fa
GLS Output
The observed waveform in simulation and synthesis matches and conforms code functionality.
- Kunal Ghosh, Co-Founder (VLSI SYSTEM DESIGN - VSD)
- Shon Taware