Untether AI introduced second-generation at-memory architecture to speed up AI inference workload and improve efficiency by up to 100x. Compared to the company’s first-generation architecture SpeedAI architecture has 80 times more throughput to fasten neural networking. SpeedAI architecture supports 512 processing elements that are directly attached to dedicated SRAM. The at-memory computation is more efficient than traditional von Neumann architecture. The second-generation at-memory computing architecture is a product of over 1,400 optimized RISC-V processors with custom instructions, energy-efficient dataflow, and the adoption of a new FP8 datatype that quadruples its efficiency compared to previous generation runAI device.
“The merits of at-memory compute have been proven with the first generation runAI device, and the second generation SpeedAI architecture enhances the energy efficiency, throughput, accuracy, and scalability of our offering,” said Arun Iyengar, CEO of Untether AI. “speedAI devices offer an ability that is unmatched by any other inference offering in the marketplace.”
The increase in demand for neural network applications requires a great level of accuracy to ensure quality results. The features like flexibility, performance combined with energy efficiency and accuracy are needed for a new approach to AI acceleration that Untether AI provides with its SpeedAI devices. The introduction to runAI devices in 2020 had an energy efficiency level of 8 TFlops/W for the INT8 datatype while the SpeedAI architecture increased this energy efficiency to 30 TFlops/W. The SpeedAI devices are designed to scale large models. It supports multi-level memory architecture that enables with 238MB of SRAM dedicated to processing elements. It provides 32 GB of external DRAM and consists of high-speed PCI-Express Gen5 interfaces.