A chip built for AI models is already running AI workloads. Could it change how AI services handle speed, cost, and demand?

OpenAI and Broadcom have introduced Jalapeño, OpenAI’s first custom AI inference processor. The chip is designed specifically for running large language models (LLMs) and is the first product in a multi-generation computing platform the two companies are developing to improve AI performance, reliability, and accessibility.
The announcement marks a major step in OpenAI’s effort to build more of the infrastructure behind its AI models and services. Broadcom executives Hock Tan and Charlie Kawwas formally delivered the processor to OpenAI CEO Sam Altman and President Greg Brockman.
Unlike general-purpose AI accelerators, Jalapeño was designed from the ground up for LLM inference. OpenAI developed the architecture based on its experience operating products such as ChatGPT, Codex, and its API services. Broadcom and Celestica supported the project with chip implementation, networking, system integration, manufacturing, and large-scale deployment capabilities.
Engineering samples are already running machine learning workloads, including GPT-5.3-Codex-Spark, at target production frequencies and power levels. While performance testing is still underway, OpenAI said early results indicate that the processor delivers significantly higher performance per watt than current leading AI accelerators.
The chip architecture focuses on reducing data movement and balancing computing, memory, and networking resources to achieve higher hardware utilization. Broadcom’s networking technologies, including its Tomahawk networking silicon, are integrated into the platform to support large-scale deployments.
The processor was developed from initial design to manufacturing tape-out in just nine months. OpenAI and Broadcom claim this represents one of the fastest ASIC development cycles in advanced semiconductor design. The companies also revealed that OpenAI’s own AI models were used to assist parts of the chip design and optimization process.
OpenAI says the ultimate goal is to make AI services faster, more dependable, and less expensive. Improvements in inference infrastructure could translate into quicker ChatGPT responses, lower API costs, better support for AI agents, and more reliable access during periods of high demand.
Click here for the original announcement.






