Which Chip is the Fastest?

Dilin Anand is executive editor at EFY. He is B.Tech in ECE from University of Calicut, and MBA from Christ University, Bengaluru


AMD’s EPYC and Ryzen are built with blocks called CPU Complex (CCX), which consist of ‘Zen’ cores connected with an L3 cache. Interestingly, while each core has its own 2MB of L3 cache, it can also access the other 6MB of cache quickly.

There are two CCX blocks in each die, while one complete EPYC chip is a multi-chip module (MCM) made of four such dies. The EPYC 7601 has 32 cores running at a maximum frequency of 3.2GHz, 64MB of L3 cache and TDP of 180W.

In the Skyake-SP Xeon Scalable Processor family, Intel added 768kB of L2 cache per core, changed the way L3 cache works and added a second 512-bit AVX-512 unit. Intel AVX-512 enables twice the number of floating-point operations per second (FLOPS) per clock cycle compared to what we had before.

Anandtech notes that “even in the best-case scenario, some of the performance advantage will be negated by the significantly lower clock speeds (base and turbo) that Intel’s AVX-512 units run at due to the sheer power demands of pushing so many FLOPS.”

The Intel Xeon Platinum 8180 has 28 cores and 56 threads being pushed at 3.80GHz, 380.5MB of L3 cache and TDP of 205W.

Intel Xeon Platinum

Intel Xeon Platinum

Memory on workstations

While the consumer CPUs mentioned earlier can accept up to 128GB of DRAM, AMD’s EPYC ecosystem can handle up to 2TB per socket when using 128GB LRDIMMs. On the other hand, Xeon-W 2123 processors can support up to 512GB of DDR4 memory, while Xeon-SP processors start at 768GB DRAM and can be extended up to 1.5TB through OEMs.

When you need massive parallelism

For researchers working on machine learning or deep learning, there are some specialist chips available like Pascal-architecture-based Tesla P100 or Vega-architecture-based Radeon Instinct MI25. While Tesla P100 delivers 9.3 teraflops for PCIe-based servers and 10.6 teraflops for NVLink-optimised servers in single-precision performance, Instinct MI25 delivers 12.3 teraflops. The P100 also delivers 4.7 teraflops of double-precision performance, while the MI25 does 768 gigaflops. (These figures have been taken from the respective vendors’ websites.)

What’s next?

There are three interesting things to watch out for. The most obvious one is how Ice Lake, successor to the 8th-generation Intel Core processor family, performs when it probably becomes available in devices in 2019. These processors will utilise Intel’s 10nm+ process technology. Note that Kaby Lake is Intel’s third Core product produced using a 14nm lithography process, specifically the second-generation 14 PLUS (or 14+) version of Intel’s 14nm process.

Next is how AMD’s Zen family evolves further, since Jim Keller, who led the Zen architecture design since 2012, left for Tesla two years ago. Finally, it is to be seen how ARM-based solutions like Falkor-core-based Centriq 2400 from Qualcomm rise up to the challenge of x86 processors in the server market.





Please enter your comment!
Please enter your name here