FPGA Prototyping Techniques

This article discussed various types of prototyping techniques along with its key advantages and issues .

10583
 

In addition, it allows multi-user access and excellent hardware debug insight in the context of real software that can be executed at MHz speeds, resulting in very efficient hardware/software debug cycles. Standard software debuggers can be attached using JTAG adaptors or virtual connections.

Virtual prototyping. Virtual prototyping can enable software development as early as a fortnight after the specification is available. It does not allow detailed hardware debug, which is the initial strength of RTL simulation. Used initially for RTL development, IP integration and design verification, RTL simulation can extend to the complexity of sub-systems and is a sign-off criterion for gate-level simulation and timing sign-off. It allows the fastest turnaround time for new RTLs, offers excellent hardware debug but is typically too slow to execute substantial amounts of software. To better extend to sub-systems and the full SoC, verification acceleration moves the device under test into hardware and can allow enough speed up for bare-metal software development.

The different engine sweet spots as an overlay on the main user tasks are shown in Fig. 3.

Techniques

C61_Fig_3Emulation and RTL simulation are combined to achieve simulation acceleration. Emulation and TLMs with fast models from ARM are combined to accelerate operating system (OS) bring-up and software-driven testing. Emulation and FPGA based prototyping are combined to combine the speed of bring-up for new portions of the design in emulation with the speed of execution for stable portions of the design in FPGA based prototyping.

Hardware assisted verification
Starting at RTL level, the ultimate goal is to make the model execute faster than it could in a software simulator. Another issue is that, software simulation slows down significantly when it exceeds the physical memory of the computer, so there is a capacity issue that plays here as well.

There are two general ways to solve the problem, referred to as direct and indirect implementations.
In the case of direct implementation, while compiling a model into an FPGA, an actual implementation of that model in hardware is created. It may not be the same as the one used in an SoC but it is an implementation nonetheless. Mapping emulates the function of the intended hardware. This technique is generally referred to as emulators, because these are directly executing an implementation of the model.

In indirect implementation, mechanisms are devised that allow effects of concurrency to be evaluated even though the simulator is actually incapable of doing more than one thing at a time.

Simulation accelerators contain a large number of simple processors, each of which simulate a small portion of the design and then pass results between themselves. Each of these processors runs slower than the processor on desktop, but the accelerator may possess millions of these smaller processors and the net result is a significantly higher execution performance. These can deal with parallelism directly as all the processors are running in parallel. An example of this type of hardware-assisted solution is Palladium product line from Cadence.

The custom chip could also contain debug circuitry, visibility mechanisms and a host of other capabilities. Each chip is capable of emulating a small piece of design, and larger designs are handled by interconnecting many chips together, again with sophisticated interconnect capabilities. An example of this type of emulator is Veloce from Mentor Graphics.

Another way to implement an emulator is by using off-the-shelf components such as FPGAs. Here, you not only map the design into the FPGA but also implement visibility, debug and other such capabilities into the FPGA. As with the custom chip, multiple FPGAs can be put together to handle arbitrary design sizes. An example of this type of emulator is ZeBu from EVE.

In in-circuit emulation, an emulator or accelerator is connected into a real-world application. Most emulators can only muster a few MHz of clock speed, especially when full visibility is made available. So it is necessary to insert a speed bridge that can handle the difference in execution rates on each side of the bridge. This involves data buffering or manipulation of protocols to artificially slow down the real world to the rate that the emulator can handle.

The next major way these are used is standalone, wherein the entire model fits into the emulator or accelerator, along with a set of stimuli to exercise the model. These can run as fast as the emulator, stopping only when additional stimulus is required or when captured data has to be flushed out of the device.

If the design contains a processor, it is also likely that a version of the processor exists for the emulator. Emulator vendors provide special boards that make many popular processors available. But if parts of the design or testbench cannot be mapped into the emulator, it has to be coupled with a software execution environment. This is called co-simulation, as it inherently involves two simulation engines cooperating to solve the problem. The emulator can now only run as fast as the simulator, or actually even slower because the communication makes it even slower.

A more modern alternative is what is called co-emulation. The primary difference between the two is that communication is raised to the transaction level rather than being at the implementation level.


V.P. Sampath is an active member of IEEE and Institution of Engineers India. He is a regular contributor to national newspapers, IEEE-MAS section and has published international papers on VLSI and networks

SHARE YOUR THOUGHTS & COMMENTS

Please enter your comment!
Please enter your name here