Post-silicon validation includes a number of activities such as validation of both functional and timing behaviour as well as non-functional requirements. Each validation has methodologies to mitigate these. This part introduces the concept of validation.
Validation includes different tasks such as functional correctness, adherence to power and performance constraints for target use-cases, tolerance for electrical noise margins, security assurance, robustness against physical stress or thermal glitches in the environment, and so on. Validation is acknowledged as a major bottleneck in system-on-chip (SoC) design methodology. It accounts for an estimated 70 per cent of overall time and resources spent on SoC design validation.
Post-silicon validation is a major bottleneck in SoC design methodology. It takes more than 50 per cent SoC overall design effort. Due to increasing SoC design complexity coupled with shrinking time-to-market constraints, it is not possible to detect all design flaws during pre-silicon validation. Validation is clearly a crucial and challenging problem as far as diversity of critical applications of computing devices in the new era is concerned, along with the complexity of the devices themselves.
Post-silicon validation makes use of a fabricated, pre-production silicon implementation of the target SoC design as the validation vehicle to run a variety of tests and software. The objective of post-silicon validation is to ensure that the silicon design works properly under actual operating conditions while executing real software, and identify and fix errors that may have been missed during pre-silicon validation.
Complexity of post-silicon validation arises from the physical nature of the validation target. It is much harder to control, observe and debug the execution of an actual silicon device than a computerised model. Post-silicon validation is also performed under a highly-aggressive schedule to ensure adherence to time-to-market requirements.
Post-silicon validation is done to capture escaped functional errors as well as electrical faults. Modern embedded computing devices are generally architected through an SoC design paradigm. An SoC architecture includes a number of pre-designed hardware blocks (potentially augmented with firmware and software) of well-defined functionality, often referred to as intellectual properties (IPs). These IPs communicate and coordinate with each other through a communication fabric or network-on-chip.
The idea of an SoC design is to quickly configure these pre-designed IPs for target use-cases of the device and connect these through standardised communication interfaces. This ensures rapid design turn-around time for new applications and market segments. It can include one or more processor cores, digital signal processors (DSPs), multiple coprocessors, controllers, analogue-to-digital converters (ADCs) and digital-to-analogue converters (DACs), all connected through a communication fabric.
Pre-simulation strengths show accurate logic behaviour—98 per cent of logic bugs found, 90 per cent of circuit bugs found, straightforward debugging and inexpensive bug fixing. Its limits are platform-level interactive and not real time.
Post-simulation (platform) strengths show the actual target platform—two per cent of logic bugs found, ten per cent of circuit bugs found. Its limits are difficult debugging and expensive bug fix.
Focus areas of post-simulation are complementary to pre-simulation—exploiting post-simulation benefits (many cycles, platform-level interactions), ISA and features, memory sub-system/hierarchy, platform power state transitions, I/O concurrency, I/O margin characterisation and core circuit bug hunting.
Functional bug hunting
1. ISA architecture/micro-architecture testing is based on biased random schemes, such as random generation of instructions, checking based on architectural simulation, CPU core intensive, high throughput of random tests but typically low I/O stress and random power state transition injection.
2. Feature-oriented directed/random tests including paging, TLB and virtualisation.
Random instruction testing
1. Wide coverage of CPU/GPU cores
- Finds suble uarch bugs
- Stresses CPU pipeline boundary conditions
- Good at finding micro-code bugs
- High throughput core testing
• Low I/O stress
• Need to complement with memory sub-systems tests
• Requires servers to generate instruction seeds
Memory sub-system validation
1. Random and directed/random memory test strategy
2. Memory channel intensive
3. Based on multi-core and multi-processor configurations
4. When the target is standard and symmetric having multi-processor attributes like cache coherency, consistency and synchronisation, and memory ordering
1. The strategy here is to simultaneously load all platform buses.
- Quick path interconnect
- DDR3 memory channels
- PCI Express Gen2
- USB, SSD
- SATA, PATA
2. Use of directed/random and biased random test generators
3. Test cards used to provide determinism
Compatibility and standards
1. Industry-standard operating systems (OSes), applications and peripherals are used to verify:
- Platform and component behavioural correctness
- Legacy compatibility of OS, applications and peripherals
2. Test configuration models and end-users
- Mobile and desktop client systems
- Blade and enterprise-level server systems
- Fully-integrated platforms: CPU, chipsets, BIOS, OSes and applications
3. Highly-stressful configurations, hunting for both functional and performance bugs
How circuit bugs appear
Some of the ways bugs appear:
1. Circuit bugs appear as DPMs—not all die or behave the same way
2. Taxonomy (classification)