Data centres play a major role in expanding the enterprise computing capabilities. These are a centralised repository for storage, management and dissemination of data and information. Today’s central processing units (CPUs) are powerful and well-suited for high-level decision-making and control tasks, but their computing power cannot be scaled economically to keep pace with ever-increasing data centre requirements.
Hardware acceleration is a solution which may be implemented using a variety of technologies, including application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), graphical processing units (GPUs) and network processing units (NPUs). FPGAs are well known for their versatility. Many of these implement communication and control functions wholly, or in part, as soft intellectual property (IP). So in data centres FPGAs can play a role in cognitive computing, high-performance computing and the Internet of Things. These can serve as coprocessors to accelerate CPU workloads. This is an approach used in supercomputing and high-performance computing (HPC), done usually by teaming CPUs with Nvidia GPUs or Intel’s x86-based Xeon Phi coprocessors.
Cloud service systems are based upon processors. Software is driving the change, which extends beyond the rise of machine learning and training neural models. A whole new set of workloads is running in the cloud today, including virtualising networks, storage processing, compression and encryption. FPGAs slot into the standard server, drawing 25W from its PCI Express (PCIe) Gen 3 slot.
Fig. 1: Accelerator architecture
An integrated platform is one in which FPGA developers can upload their design in an IP library and data centre users can download the IP cores they need. This decoupling will enable widespread adoption of heterogeneous processing in data centres. Data centre operators can optimise the infrastructures by hyperscale, where FPGAs offer more sophisticated processing architectures that bring HPC-style parallel processing of workloads.
Fig. 2: Accelerator attach
Accelerating the data centre applications has become a requirement due to power and space constraints. Accelerators are needed to increase performance at a low total cost of ownership for targeted workloads.
An accelerator is a separate architectural substructure on the same chip or on a different die. It is architected with a different set of objectives than the base processor, where objectives are based on the needs of a special class of applications.
Accelerators are tuned to provide a higher performance at lower cost, or at lower power, or with less development effort than with the general-purpose base hardware. Depending on the domain, accelerators often bring greater than ten times advantage in performance, cost or power consumption over a general-purpose processor.
Examples of accelerators include floating-point coprocessors, GPUs to accelerate the rendering of a vertex-based 3D model into a 2D viewing plane, and accelerators for the motion estimation step of a video codec. Applications such as big data analytics, search, machine learning, network functions virtualisation, wireless 4G/5G, in-memory database processing, video analytics and network processing benefit from acceleration engines, which need to move data seamlessly among the various system components.
HyperTransport, front-side bus (FSB) and QuickPath Interconnect (QPI) are not commercially successful because these have a proprietary ‘in socket’ attached, server mechanical fit challenges, need to give up processor socket, proprietary specification, licensing issues and always-lagging 6-12 month Cadence.
Fig. 3: ‘In socket’ coherent accelerators
IBM’s Coherent Accelerator Processor Interface (CAPI) interface resolves all these issues. This OpenPower-based solution is well documented and already commercialised. It has proprietary ‘in socket’ attached, and processors use all the available sockets. It is based on PCIe, hence there are no mechanical challenges. It is free, simple, OpenPower-licensed. Also, it is in sync with Power and Cadence’ PCIe interface IP.
IBM’s next-generation Power9 processor will have 24 cores, double that of today’s Power8 chips. Design is optimised for two-socket scale-out servers (hence the name Power9 SO), and includes on-chip acceleration for compression and encryption.
Fig. 4: Power8 modules with CAPI PCIe cards
Power9 will be available in two basic designs: a 24 SMT4 processor and a 12 SMT8 processor. The 24 SMT4 processor will be optimised for the Linux ecosystem and will target web service companies such as Google, which need to run across several thousand machines. It will feature four threads.
The 12 SMT8 processor will be optimised for the Power ecosystem and will target larger systems designed for running big data or artificial intelligence (AI) applications. It will feature eight threads.
Both designs will come in two models: The scale-out model will come with two CPU sockets on the motherboard, while the scale-up model will come with multiple CPU sockets.
Fig. 5: Power processor roadmap
Power9 processor will be a 14nm high-performance FinFET product. It will directly attach to DDR4 RAM, connect to peripherals and Nvidia GPUs via PCIe Gen 4 and NVLink 2.0, and transfer accelerator data at 25Gbps speeds. Power9 processor will have multiple connectors to attach FPGAs, GPUs and ASICs. It will be well suited for AI, cognitive computing, analytics, visual computing and hyper-scale web serving. IBM will exclusively use Power9 in its servers and make it available to other hardware companies by licensing the design to them. Power9 processors can be attached directly to various accelerators rather than communicating via the PCIe bus.