Tuesday, January 27, 2026
HomeElectronics NewsProcessor Optimised for Large-Scale AI Inference

Processor Optimised for Large-Scale AI Inference

By integrating compute, memory, and acceleration, the chip allows LLMs to run efficiently, supporting the next generation of AI applications at scale.

Microsoft Maia 200 chipset.
Microsoft Maia 200 chipset.

As AI chatbots and assistants scale users, the cost of running inference and the process of serving AI responses, has become a major challenge. Speed, efficiency, and stability are now as important as raw training power, especially for cloud providers managing continuous, high-volume workloads. This has increased demand for AI chips optimised specifically for inference at scale.

- Advertisement -

Microsoft has introduced the Maia 200, its second-generation in-house AI processor, designed for large-scale inference. Maia 200 builds on the 2023 Maia 100, offering a significant boost in performance while supporting current and future AI models. 

The Maia 200 integrates over 100 billion transistors and delivers more than 10 petaflops at 4-bit precision and roughly 5 petaflops at 8-bit precision. Large amounts of fast SRAM reduce latency for repeated queries, ensuring responsiveness even under high user traffic. The chip is optimised for real-world AI workloads rather than training benchmarks, allowing the company to run large models efficiently and continuously.

Manufactured using TSMC’s 3 nanometre process, Maia 200 also features high-bandwidth memory. The company claims the chip to provide 3x the FP4 performance of Amazon’s Trainium 3 and stronger FP8 performance than Google’s latest TPU. Maia 200 is part of the company’s broader strategy to reduce reliance on NVIDIA GPUs while maintaining competitive performance and cost control.

- Advertisement -

Key features of the chip include:

  • Up to 180 TOPS of combined AI compute via CPU, GPU, and NPU
  • Hybrid CPU architecture with high-performance, efficiency, and low-power cores
  • Large on-chip SRAM for low-latency inference
  • Industrial-grade design for continuous operation at scale
  • 3 nanometre process technology with high-bandwidth memory

The chip is also paired with open-source AI software tools for efficient development, the platform is designed to meet the growing need for fast, reliable, and cost-effective AI inference. 

Saba Aafreen
Saba Aafreen
Saba Aafreen is a Tech Journalist at EFY who blends on-ground industrial experience with a growing focus on AI-driven technologies in the evolving electronic industries.

SHARE YOUR THOUGHTS & COMMENTS

EFY Prime

Unique DIY Projects

Electronics News

Truly Innovative Electronics

Latest DIY Videos

Electronics Components

Electronics Jobs

Calculators For Electronics