HomeElectronics NewsNew ASIC Chip Embeds AI Models Directly Into Hardware 

New ASIC Chip Embeds AI Models Directly Into Hardware 

New inference hardware claims up to 10x faster AI response times with drastically lower power and cost by embedding models directly into custom silicon rather than relying on GPUs.

New ASIC Chip Embeds AI Models Directly Into Hardware 

A new AI accelerator architecture by Taalas is challenging the conventional GPU-driven approach to artificial intelligence inference by converting entire AI models into dedicated silicon. The newly introduced chip architecture hardwires model weights and parameters directly into ASICs, enabling ultra-low latency and significantly lower energy consumption for large language model (LLM) inference workloads. 

- Advertisement -

Unlike GPUs and even many AI accelerators that remain programmable, the new approach creates model-specific chips optimized for a single neural network. The result is dramatically faster inference performance, especially for high-volume workloads where the same AI model serves millions of queries repeatedly. According to disclosed benchmarks, the first-generation chip running a compressed Llama 3.1 8B model reportedly achieves over 14,000 to 17,000 tokens per second—far beyond most currently deployed inference hardware. 

The key features are:

  • Hardwires AI models directly into ASIC silicon
  • Delivers over 14,000–17,000 tokens per second
  • Eliminates dependence on external HBM memory
  • Operates at significantly lower rack-level power
  • Supports rapid chip revisions in nearly two months

The architecture removes dependence on high-bandwidth memory (HBM), one of the costliest and most supply-constrained elements in modern AI systems. Instead, model parameters are etched directly into silicon, reducing memory bottlenecks and enabling lower rack-level power consumption. The company claims AI racks based on the architecture consume roughly 12–15kW, compared to 120–600kW for GPU-based AI racks. 

- Advertisement -

The system also avoids liquid cooling and advanced packaging requirements, potentially allowing deployment in standard server infrastructure. Analysts suggest this could significantly reduce infrastructure costs for cloud providers and enterprise AI deployments. 

However, the tradeoff is flexibility. Since each chip is tied to a specific AI model, updates require new silicon revisions. To address this, the startup says only two metal layers need modification during redesign, allowing updated chips to be produced within roughly two months instead of years. 

The company plans to scale the architecture beyond smaller LLMs toward larger reasoning models and multi-chip AI systems later this year. Industry observers see the development as part of a broader trend toward highly specialized AI hardware, similar to how TPUs transformed cloud AI acceleration.

Akanksha Gaur
Akanksha Gaur
Akanksha Sondhi Gaur is a journalist at EFY. She has a German patent and brings a robust blend of 7 years of industrial & academic prowess to the table. Passionate about electronics, she has penned numerous research papers showcasing her expertise and keen insight.

SHARE YOUR THOUGHTS & COMMENTS

EFY Prime

Unique DIY Projects

Electronics News

Truly Innovative Electronics

Latest DIY Videos

Electronics Components

Electronics Jobs

Calculators For Electronics