Power dissipation is a concern while designing high-performance systems. Clock gating technique can be used to reduce power.
The Internet of Things (IoT) connects smartphones, desktop systems, smart sensors, smart automobiles, and the things embedded with electronics like software, sensors and actuators with the Internet. With rapid advances in fields like software, semiconductor and communication, hardware miniaturisation has pushed the incorporation of intelligence and connectivity into the smallest of things like sensors in clothes, to huge systems like automobiles, factories or even cities. Analysts estimate that there will be 50 billion Internet-connected devices by 2020, up from 25 billion in 2015. This predicted explosion of IoT devices affects various evolving and growing markets as well as entirely new applications.
This level of integration of intelligence and connectivity requires highly-reliable, highly-secure, high-performance, low-cost and low-power system boards and integrated chips. Despite the variety of target applications, all such devices demand low energy/power consumption, high reliability, connectivity, interoperability, security and privacy.
Further, while meeting these demands, time-to-market is a critical metric that determines whether an IoT product will capture market share and, thus, the greatest revenue opportunity. Hence, a design methodology that meets design goals while ensuring fast time-to-market is crucial.
The IoT encompasses technologies like machine-to-machine (M2M), machine-to-infrastructure, machine-to-environment, the Internet of everything, the Internet of intelligent things and intelligent systems, among others. Cost of IoT products plays a disproportionately important role in market viability. Since most IoT applications are driven to provide cost-effective solutions, IoT system developers do not target newer evolving protocols or technology nodes.
Battery-backed-up devices have gained more popularity in the last few years. This demands smaller form factor, lower weight, increased battery life and high performance. Architects and designers have learned that there is no single solution to the problem—power optimisation needs to be considered at all phases of the design life cycle, including architecture, algorithms, technology, register transfer level (RTL) design, backend activities and so on.
Power dissipation is a concern while designing high-performance systems. This makes clock signals a great source of power dissipation. High-frequency systems use a large number of clock pulses. Consider a system that is fed clock pulses to change its output. In most cases, pulses go waste as output does not change. Thus, extra clock pulses form a part of power dissipation rather that adding to circuit performance. Thus, there is a need for clock gating technique, which can be applied to reduce power.
It is better to consider power in all stages of the system: software and hardware design, irrespective of whether it is a system board design or a chip design. For integrated circuit (IC) design flow, there are several stages in which low-power design techniques can be used. These include requirement specification, architecture design, RTL development, synthesis activities and physical design activities.
At the architectural level, designers make decisions such as choosing clock frequency, micro-architecture of design, partitioning the design with respect to clock and so on. In this phase, they do not have a lot of flexibility to reduce overall power consumption.
Following are three architectural low-power techniques used by designers:
- Clock gating
- Architectural clock gating
- Dynamic frequency variation
Low-power techniques come at the cost of speed, area and performance. Based on the application or system requirement, these need to be carefully adopted. In a system, a significant portion of dynamic power is consumed by the clock distribution network. Since clock buffers have the highest toggle rate in the system, these consume 50 per cent or even more dynamic power. Further, these have high drive strength to reduce clock delay.
In addition, flip-flops receiving the clocks can dissipate dynamic power even though these are not switching states. Designers can turn off the clocks for transistors or flip-flops when not functional. This helps in reducing a significant portion of dynamic power consumption, while preserving the state of transistors or flip-flops.
Clock gating methods. Following are the existing clock gating methods:
- Gating based on inactive blocks/interfaces and protocol-defined states
- Architectural clock gating implementation
- Software program entry and exit
- Hardware-driven and exit
- Mixed hardware and software
Consider a typical multi-bit flip-flop logic with enable (EN), as shown in Fig. 2. Updating of the flip-flop is done based on flip-flop enable; otherwise, the old value is retained.
The same circuit can be implemented with clock gating logic, as shown in Fig. 2. Clock gating reduces a significant amount of dynamic power consumed by the circuit. It also saves additional power and area by reducing the need for multiplexer logic at inputs of the flip-flops. This technique makes use of negative edge latch and AND gate to implement clock gating.
Today, most of the latest technology libraries include clock gating cells. Implementation tools can use these cells to perform auto-insertion operation of clock gating cells to reduce a considerable amount of dynamic power consumption.
Architectural clock gating
Architectural clock gating in RTL is a configurable option in all versions of Cortex-M3.
In r1p1, it is controlled by several definitions and descriptions of licensees of the RTL.
In r2p0, it is controlled by the single CLKGATE_PRESENT parameter for each instantiation of Cortex-M3, as described in Cortex-M3 integration and implementation manual available to RTL licensees. Control of clock gating is inferred from circuit activity. To minimise power consumption, ARM recommends the use of both architectural clock gating and leaf cell clock gating, inferred by synthesis tools for ASIC (chip) implementation by RTL licensees.
For FPGA prototyping, clock gating can cause some difficulty and inconvenience. However, as the benefit is relatively small in FPGA prototypes, it makes sense to de-configure clock gating in this case.
Architectural clock gating is a context-aware technique to determine functional idle period to optimise power. While designing the architecture, designers can identify the blocks for which clocks can be turned off under certain conditions. These conditions must be auto-detected by RTL or programmed through the software. Then, clocks can be turned off for that block, thereby reducing significant amount of dynamic power.
Architectural clock gating is mainly based on foreseeing which part of the design will be inactive under what condition, and gate that part of the design accordingly. It can evolve in two ways: clock gating based on protocol/design of IP during implementation, and analyse design power after implementation and optimise power using clock gating.
Programmable clock gating is done via a user programmable control register that determines the entry and exit of clock gating. This method is useful if exit and entry durations are high (considering software latency).
A disadvantage to this method is that, while disabling clock gating, if software latency is too high, it may lead to functional errors. A modification to the method is to use software for entry and exit using additional wakeup logic. Further modification is to provide a control register bit to enable clock gating for safety purposes, and logic for entry and exit.
Adaptive clock gating can be applied if complete IP is modelled into a single finite state machine (FSM) with several states. These can be divided into working and idle states. In this method, designers can gate the clock to complete IP during idle state.
There are no architecturally-defined FSMs. This method is used for a floating point ALU, which has several pipelines. When there are no instructions in the bus, it is used for gating the clock, assuming it is in idle state of IP.
In most IPs based on standard protocols, protocols themselves define the idle states for low-power mode. This method is helpful for IPs having single clock domain. These days, most IPs have multiple clock domains, which have scenarios where one clock domain is inactive, while the other is active. In those cases, it is not easy to gate the clock using this method.
Dynamic frequency variation
Dynamic frequency variation is a technique in which frequency of a particular block is dynamically increased to perform some operation. Once the operation is finished, clock frequency is decreased to original low frequency. For example, I2C controller application interface normally operates at lower frequency, but during data transfer, application interface clock frequency can be dynamically increased to finish data transfer quickly. Thus, average dynamic power consumption is reduced. The system can be made low power by considering low-power RTL design, gate-level optimisation, frequency islands, power gating, multi-supply voltage, multiple threshold voltage, dynamic voltage scaling and so on.
Most IoT applications use existing protocols such as USB and I2C, and ultra-low power fabrication processes. Given that most IoT devices are battery-powered, small power budgets are a strict design requirement. To this end, designers must take care of implementing low-power strategies at every level of abstraction, including inter-chip communication, SoC micro-architecture and technology node.
IP vendors are also updating their designs to meet the specific needs of the IoT market. Some popular low-power approaches that have been in use for some time are: clock gating, gate-level power optimisation, multi-VDD, multi-VT, power gating and adaptive voltage scaling. Reducing dynamic power is a major design focus for specialised IoT-optimised ultra-low power fabrication processes.
Clock gating can reduce dynamic power consumption applied at both front-end and back-end design phases. It can broadly be implemented at register or architectural level. With recent advancements in EDA tools, clock gating at register level is handled more efficiently. Clock gating is deployed at architecture level based on specific conditions unique to design and usage.
Traffic pattern between M2M
Traffic pattern between M2M in major IoT applications can be mainly modelled into three types: periodic update, event-driven and payload exchange.
Periodic update is used in non-real-time and event-triggered applications at regular time patterns. Examples are smart meter readings such as cooking gas, electricity and water.
Event-driven is used in real time or non real time, based on event monitoring. Examples include alarms and emergency alerts.
Payload exchange is used in response to the previous periodic update or event-driven traffic. This model shows that most IoT applications do not have continuous traffic all the time. This allows designers to think about using power-saving methods efficiently at protocol or architectural level.
Most architecture- or protocol-level implementations based on standard connectivity protocols comprise active states (with and without traffic) and low-power states. It is common to implement payload exchange to save power during protocol-defined low-power states. But during active states with minimal or no traffic, payload exchange cannot be used, because latencies in switching off power and powering it back on often exceed protocol timeouts for entry and exit. Therefore to reduce power during active state, clock gating is used at architectural level. This can be achieved based on traffic pattern.
For transfers that are carried out in bursts, it is possible to enter low-power state when there is no traffic and save power. But in case of periodic transfers, the device needs to remain in active state. When it is in active state for applications such as tracking/monitoring systems, traffic may not be present for the complete period.
Clock gating can be used effectively to save power when there is no traffic, if implemented prudently. In this approach, instead of relying on protocol-specific low-power states to perform clock gating, efficient ways for specific blocks are defined based on the internal architecture.
Consider USB2.0 protocol as an example. It comprises active states (with and without traffic) and low-power modes such as L1/L2 (sleep/suspend). During low-power mode, USB sub-system (controller and physical layer) can enter physical layer or clock gating depending on entry and exit duration. When USB sub-system is in active state, physical layer power is 20 to 50 times controller power.
Traditionally, efforts have been made to reduce physical layer active power. But, there is a fundamental limit beyond which it is not easy to reduce physical layer power during active traffic.
Efforts have also been made on reducing physical layer power when controller/physical layer sub-system is not functionally used in the system and during suspend conditions. During L2 (long duration of entry and exit), amount of power consumed by the sub-system can be very low due to controller power gating and physical layer low-power techniques. But many use cases cannot make use of USB L2. Operating system support for USB L2 has been slow.
Transition time between active and suspend is often 100+ milliseconds in practice. When USB applications require much lesser response transition time than L2, these take advantage of L1. During L1, physical layer power can be reduced by more than 99 per cent by appropriate choice of active/idle transition times. But controller sleep power is not significantly reduced with traditional clock gating.
A controller has limited power-saving capability due to its requirement to wake up fast from sleep. Hence, it cannot power gate. In practice, this means a controller’s sleep power is 10 to 20 times the physical layer sleep power. Clock gating handles power saving when there is no traffic and during low-power modes.
Clock gating methods or traffic pattern between M2M
- 1. EDA TICG (register level)
- 2. Architectural clock gating
Currently, EDA tools provide clock gating options during synthesis. This is an efficient way of inserting register-level clock gating cells with minimal effort. The tool, based on RTL coding style, detects synchronous load-enable registers (flip-flops that share the same clock and synchronous control signals) and inserts clock gating cells with synchronous control signals as enable condition to the corresponding clock.
For Design Compiler tool, automatic insertion of clock gating is done. For example, if RTL is coded as below, DC checks the synchronous load enable signal clk_en in all possible flops and then uses clk_en as one of the enable conditions to gate corresponding flops.
always@(posedge clk or negedge reset)
if (reset == 1’b0) // asynchronous reset
q <= 32’d0;
if (clk_en) // synchronous load enable
q <= d;
This method can be used for all existing IPs. It provides optimal power saving to IPs that have a large register count.
First, synthesise the IP and then analyse register count. If there is a large register count, reduce large dynamic power. If there is synchronous load enable(s) in the design, analyse gated and ungated register reports in the IP. If possible, check ungated registers to see if there is a way to change RTL coding style so that the tool can gate it. Manually estimate the power saved in the IP for gating certain registers using the tool. Compare this by changing all ungated registers with gated registers.
Using this method, in most cases, efforts involved in changing ungated register to gated register is high. In some cases, it is not easy to modify the coding style, while some registers (CDC synchronisers) may not have synchronous load enables. Therefore power saving from TICG does not give a completely optimal value, but there can be further power reduction in all cases.
V.P. Sampath is a senior member of IEEE and a member of Institution of Engineers India, working in an FPGA design house. He has published international papers on VLSI and networks