Project SCH: Scheduling and RTL Design in Arx

This project is a compulsory part of the examination for the Implementation of Digital Signal Processing course at the University of Twente. The goals of this project are:

Preparation

This project assumes that you are familiar with:

Files and Directories

Go to your home directory and fetch the files for this project with:
get-module sch sch
Three subdirectories of sch will be created:

Arx Source Files

Change to directory arx. This directory contains three files:

The Second-Order IIR Filter

The topic of this exercise is the same second-order IIR filter that was used in the previous projects and has been reproduced below for your convenience:

Every computation is translated into a separate functional unit, every edge in the data-flow graph is a wire bundle and every delay element is a register. The hardware consumes and produces a signal sample every clock cycle. Study the code in sec_par.arx and check for yourself that the code describes indeed a 1-to-1 mapping of the data-flow graph given in the figure into hardware.

Now run make (type make in the shell). You will see that for both Arx source files, C++ and VHDL will be generated. The warnings are related to the fact that the filter coefficients do not fit the given fixed-point data formats without loss of precision.

Exercise SCH-1: Arx C++ Simulation

Perform first the following: If everything went well, you should now see that the library sys contains many models including the files sec_par.h and sec_par.cpp that were generated by Arx. You can double-click them to see their contents and verify that they contain C++ code (which is something different from the C variant used by CCSS to describe Prim models). CCSS is able to parse C++ files.

Double-click on model arx_sec_par. This is a Prim model that acts as a wrapper for the C++ code generated by Arx. Its content is rather trivial. The wrapper casts the data type i32 that is used by Arx but is not directly understood by CCSS, to and from data type int with which the wrapper communicates with the outside world. The wrapper also calls the methods run and reset at the appropriate place.

Now open model tb_sec_soc_par which is the testbench for arx_sec_par. As in exercise TRA, the testbench feeds the filter with a superposition of a low frequent and high-frequent sine wave. As the filter is a high-pass filter, the low-frequent signal should be strongly attenuated in the result. Run the testbench and visually verify the expected behavior in Davis. Do not use an scf file; just use the default values for all parameters.

Different data types are used in the model. List all conversions in the chain from sources to sinks and explain why these conversions are made. Note: the delay element in the chain is not necessary but is there to make the model consistent with the testbench that will perform co-simulation with VHDL.

What are the values of the first 5 samples arriving at the sink? You can use the Browse Data ... (under the right mouse button) in Davis to see the exact values.

Principles of Co-Simulation

As you should know, CCSS is basically a data-flow simulator that translates a data-flow graph into efficient execution code taking advantage of its knowledge on static consumption and production rates wherever possible. When these rates cannot be predicted at compile time, FIFO queues are used in accordance with the data-flow model.

The event-driven simulation model for a language as VHDL is quite different. CCSS provides an interfacing mechanism for an external VHDL simulator such as Questasim. The idea is that the hardware modeled in VHDL is synchronous. At each clock cycle a new token is sent from the data-flow environment to each input of the hardware and a new token is collected from the hardware and sent into the data-flow environment. CCSS also supports schedules where communication only takes place on a specified subset of clock cycles, but this feature will not be used here.

The main advantage of co-simulation is that one can reuse the testbench developed for system-level design for the verification of the RT-level design, possibly after some minor adjustments. In the case of CCSS, all interfacing with the external simulator is automatically generated and the external VHDL code is automatically compiled.

Exercise SCH-2: Simulate and Compare the C++ and VHDL Generated by Arx

The file sec_par.vhd in directory vhdl contains the VHDL generated by Arx of the one-to-one (or fully parallel) implementation of the second-order IIR filter.

The goal of this exercise is to embed the VHDL code in the CCSS testbench tb_sec_soc_vhdl. Open this model. Do not check this design yet, as a submodel sec_par_std_if is missing. This model is actually tb_sec_soc_par with an additional branch to simulate VHDL alongside with C++.

Perform the following steps to create the missing submodel:

In the case that you want to change any of these settings (not recommended) or you have made a mistake that you want to correct, you can reinvoke the popup by selecting the wrapper model and proceeding with Model -> Redefine Implementation ....

CCSS will deposit the files that Questasim needs for interfacing in subdirectory tmp/ccss/hdl/ of your home directory.

After you finish the procedure above, you will have the model sec_par_std_if in your library. Instantiate this model in the empty space reserved for it in the testbench. Now, you can check your design. Some VHDL will directly be compiled by Questasim. You can find the diagnostics in the Code Generation tab of the main CCSS window.

The CCSS model bit2bit takes care of the transitions from the fixed-point data types used in the testbench and the SystemC bit vectors of the hardware interface. They simply copy the signals bit by bit.

There is an extra delay at the input side. This model has been inserted to allow the hardware model to execute a reset cycle. Otherwise, the hardware model would loose the first sample of the data stream.

Double-clicking the sec_par_std_if instantiation in the testbench will pop-up a window with parameter settings. Change the value of debug to 1. This setting will bring about that the graphical user interface (GUI) of Questasim will be launched prior to simulation. You can then trace waveforms, etc. in the way that you are used to. Be aware that CCSS generates a wrapper around the VHDL code generated by Arx. The Arx model is the submodel with instance name eut (entity under test); the level above is an automatically generated wrapper taking care of the right interfacing.

For an interactive co-simulation act as follows:

Trace relevant waveforms from Questasim and include them in your report. What are the values of the first 5 output samples that you see in Questasim? How do they compare to the samples of the C++ simulation? Explain possible differences.

If everything went well, the conclusion of this exercise should be that the C++ and VHDL generated from Arx behave exactly the same. This also means that it is not necessary to simulate the VHDL for each design made in Arx. The VHDL will serve primarily as input for synthesis. In practice, it is also wise to perform a post-synthesis simulation. This could be done in the same way as above with a CCSS testbench, but is outside the scope of this course.

If, for some reason, the simulation gets stuck (is in deadlock), terminate the Questasim process from the Linux prompt. First find out the process number by

ps -ef | grep vsim
and then kill the process by typing kill <process number>.

Exercise SCH-3: Simulating the Serial Implementation of the Second-Order IIR Filter

Go back to the arx directory and study the description of file sec_ser.arx. It contains a serial implementation of the second-order IIR filter using a single multiplier and a single adder. The design is explained below. The multiplications have been labeled as follows:
m1: xn * b0
m2: xn * b1
m3: yn * a1
m4: xn * b2
m5: yn * a2
And the additions as:
p1: m1 + z2
p2: m2 + z1
p3: m3 + p2
p4: m4 + m5
Then, an overlapped schedule using 5 clock cycles can be as follows (an @-sign indicates an operation to the previous iteration, a #-sign an operation of the next iteration):
time:   0   1   2   3   4   5 (=0)
   *:   m1  m2  m3  m4  m5  m1# ...
   +:   p4@ p1  p2  p3      p4
Completing the design, requires that the entire data path is specified, including registers, multiplexers, etc. Below, such a data path is shown:

The register transfers below indicate for each computation the source and destination locations (C refers to the coefficient memory; output yn is stored in register R2, delay z1 in R4, z2 in R3; R1, R3 and R4 are used for the storage of other values as well):
m1: C, x  -> R1    p4: R1, R4 -> R4
m2: C, x  -> R1    p1: R1, R3 -> R2
m3: C, R2 -> R1    p2: R1, R4 -> R3
m4: C, x  -> R1    p3: R1, R3 -> R3
m5: C, R2 -> R4    

The C++ and VHDL for this model were already generated when make was called earlier. The model arx_sec_ser contains the Prim model that interfaces with the C++ coming from Arx. Open it. You will see that the run method of the C++ object is called 5 times in each invocation. This corresponds to the iteration period having value 5. As the design has been made in such a way that the output register only changes once in 5 clock cycles, it is not necessary to bother about in which of the 5 cycles the value should be read and written to the node's output.

Run the simulation and compare the output to the results of the parallel version of the filter. For easy comparison you can either read signals from two simulations in one Davis session or modify the CCSS testbench to run both versions of the filter in parallel.

If everything went well, you should see identical behavior except for a time shift. Explain this time shift.

Exercise SCH-4: RTL Synthesis

In this exercise, both the parallel and serial versions of the design will be synthesized. Which of the two designs do you expect to be larger? Motivate your answer before actually performing the sysnthesis.

Directory vhdl contains the generate-design script that you know from the System-on-Chip Design course. Use it to synthesize both the parallel and serial versions of the filter (do not forget to run it via srun). For each design, study the log file and pay special attention to the resource report. It mentions all adders and multipliers to be implemented by Synopsys including word lengths (in the reference report on the other hand, not all adders and multipliers are mentioned as some of them are directly expanded into gates).

For each design explain the information given in the resource report. For each resource, point out from which part of the Arx code it originates.

Now check the areas reported for both designs. Which of the designs is larger? Explain. Synthesize both VHDL descriptions of the filter for a clock period of 10 ns and analyze the results. Which of the two designs is larger? Is that according to expectation? Try to explain the results.

Exercise SCH-5: RTL Alternative(s)

Design a third version of the second-order IIR filter, which uses two multipliers and two adders. Make first a paper design, then create an Arx description in directory arx. Modify the makefile in that directory to include your new design.

When the Arx code compiles without errors, you can simulate it in CCSS. Use Library -> Add Existing Files ... to make the .h and .cpp file visible in CCSS. Then, create a Prim model that interfaces with C++ and finally create a testbench for your version of the filter. Simulate and try to make the design to have exactly the same output stream as the other two provided versions.

When ready with the design, synthesize the VHDL and discuss the performance figures (area, resources, critical path).

If you have time left, consider one or more other design alternatives. Try especially to reduce the area. Is it a good idea to use one or more multiply-accumulate blocks in the data path? Those of you who work alone should better concentrate on a single design version version with well-motivated design choices rather than spending time on multiple alternatives for which the design choices are poorly motivated.

Arx Points of Attention

Deliverables

Write a short report always motivating your choices and explaining the way you have reached your answers. Particular points of attention:

Grading


Go (back) to  Sabih's Home Page.
Last update on: Mon Feb 20 16:31:22 CET 2017 by Sabih Gerez.