Wednesday, March 22, 2023

Hybrid Emulation And The Challenges Ahead

V.P. Sampath, an active member of IEEE and Institution of Engineers India Ltd, is currently working as a technical architect at AdeptChips, Bengaluru. He is a regular contributor to national newspapers, IEEE-MAS section, and has published international papers in VLSI and networks

- Advertisement -

Software at these lower levels is called hardware-dependent software. A model on which you can test aspects of hardware-dependent software, such as a BSP or BIOS, needs high accuracy, and may include a cycle-accurate model of the relevant hardware itself. This may be in an RTL simulator, emulator or FPGA based prototype, depending on how fast you need to run. Software at the highest levels of the stack, such as apps and other user space programs, need the least accuracy, and can therefore run at the highest speeds.

Virtual platforms and FPGA based emulation

The task of a virtual platform is to have just enough accuracy to support the level of software being run on it. This is largely achieved by modelling the behaviour and inter-block communications at transaction level, which makes these inherently faster than equivalent cycle-accurate representations. If you have the necessary models or libraries of models usually written in SystemC, then you can create a virtual platform for any SoC. Such libraries of SystemC models are available as open source or commercial packages such as the system-level library.

SoC designs are dominated by ARM IP, so you will need models of ARM cores and bus sub-systems. ARM supplies such models by the name of Fast Models, and these models are already in wide use in virtual platforms worldwide.

- Advertisement -

You have seen how RTL can be used to replace missing transaction-level models. Using Standard Co-Emulation Modelling Interface (SCE-MI), you can create a hybrid emulation platform in order to substitute a transaction-level model for not-yet-available RTL blocks. In some cases, you might not have access to the RTL for the CPU IP. In such cases, you can either use Fast Models or mount a hardware test chip into the emulator. However, the latest CPU IP often has a virtual model available well before a customer-usable test chip.

SCE-MI can be used to solve the following customer-verification problem:

Most emulators on the market today offer some proprietary APIs in addition to SCE-MI 1.1 API.

Proliferation of APIs makes it very difficult for software based verification products to port to different emulators, thus restricting the solutions available to customers. This also leads to low productivity and low return on investment for emulator customers who build their own solutions.

Emulation APIs that exist today are oriented towards gate-level and not system-level verification.

If you cannot meet performance targets with the whole SoC in an FPGA based prototype or an emulator, and if you do not need cycle-accurate behaviour within the CPU, you can replace the CPU with Fast Models running in a virtual platform linked via SCE-MI.

Virtual models of the CPU run Instruction Set Simulation (ISS) rather than model the CPU’s gate-level behaviour and, hence, perform much faster than running its RTL implemented in FPGA, or an emulator.

ARM Fast Models also represent additional device-specific functionalities such as cache coherency that a more generic ISS would ignore, but would still maintain the performance advantage. As an example, a CPU IP sub-system may typically run at 2GHz in 28nm silicon. But how fast will it run in various verification platforms?

There are many dependencies, but the CPU’s RTL might only reach 20MHz when partitioned into an FPGA based prototype, or just 2MHz in a traditional Big Box emulator—an FPGA based emulator could be somewhere between those two figures. That same processor’s virtual platform might run at a rate equivalent to more than 1GHz, although this figure is slightly misleading given that the models are untimed, so it is better considered in terms of operations per second.

The CPU or a CPU sub-system such as ARM Cortex-A53 cluster is usually delivered as pre-verified RTL, so re-verifying it fully to cycle-level accuracy may be a waste of time. All you need to do is to model the CPU’s interfaces with the rest of the SoC with cycle-accuracy, while modelling the CPU’s internal activity with untimed transactions. Operation of the CPU, its software and the overall SoC prototype will be accelerated as a result.

This use mode might also be valuable if the target FPGA hardware is running out of capacity such that you do not have room to implement all of the RTL in the FPGA hardware at once. By selectively pushing parts of the design over to the virtual side, you can not only boost performance but also free up FPGA resources.

In SoC verification, directed tests are often used in conjunction with constrained random tests in order to provide necessary functional coverage. The most sophisticated directed tests are written such that these can adapt the stimulus to reflect activity within the design under test. Such tests are often written in SystemC, so it is a short jump to consider driving these from a virtual platform running specific test software.



What's New @

Truly Innovative Tech

MOst Popular Videos

Electronics Components

Tech Contests