- calendar_today August 17, 2025
Google introduced its seventh-generation Tensor Processing Unit (TPU) with the Ironwood codename which represents a major advancement in their custom AI hardware. Google developed the Ironwood TPU to support its cutting edge Gemini models which require advanced simulated reasoning capabilities known as “thinking” and this hardware breakthrough enables more robust “agentic AI” functions that represent Google’s “age of inference.”
The company maintains that its Gemini models’ capabilities depend directly on its infrastructure, while custom AI hardware accelerates inference processes and extends context windows. Ironwood stands out as Google’s most advanced and scalable TPU ever created to enable AI systems to function autonomously by gathering data and producing results while acting on behalf of users, which embodies Google’s concept for agentic AI.
Ironwood provides a significant boost in throughput performance over previous models. The deployment strategy for these chips involves installing them within large clusters that use liquid cooling systems to support up to 9,216 units. The enhanced Inter-Chip Interconnect (ICI) enables these chips to establish direct communication paths which ensure fast and efficient data transfers throughout the vast system.
The powerful design serves purposes beyond Google’s own operational use. Developers looking to run demanding AI projects in the cloud will also be able to leverage Ironwood through two distinct configurations: Developers can select between two Ironwood configurations including a 256-chip server and an extensive 9,216-chip cluster for their AI workloads.
Google’s Ironwood pods reach a maximum performance of 42.5 Exaflops during inference computing when deployed in their largest configuration. Google claims that the Ironwood chip achieves a peak throughput of 4,614 TFLOPs, which marks a major advancement from previous generations. Google has enhanced the memory capacity of its new TPUs by equipping each chip with 192GB of memory, which represents a sixfold increase from the memory capacity of the previous Trillium TPU generation. The memory bandwidth experienced a dramatic rise, reaching 7.2 Tbps, which marks a 4.5x improvement.
Google uses FP8 precision for benchmarking Ironwood since direct comparisons to other AI hardware remain difficult because of different measurement methodologies. The claim by the company about Ironwood “pods” achieving speeds 24 times faster than segments from the world’s most powerful supercomputers requires careful examination because some systems do not support FP8 natively. Google excludes its TPU v6 (Trillium) hardware from its direct performance comparisons. Google claims Ironwood achieves double the performance per watt when compared to their v6 hardware. Ironwood serves as the successor to the TPU v5p, while Trillium comes after the less powerful TPU v5e, according to company information. Trillium hardware delivered performance results of around 918 TFLOPS during FP8 precision operations.
Although benchmarking presents many challenges Ironwood stands out as a breakthrough for Google’s AI technology system. Ironwood delivers superior speed and efficiency advancements beyond earlier TPUs while leveraging Google’s established strong infrastructure, which has already enabled swift developments in large language models and simulated reasoning. The market-leading Gemini 2.5 model from Google operates using older TPU technology. Ironwood’s faster inference speed and improved efficiency will enable major AI advancements next year and mark the start of the “age of inference” alongside more powerful agentic AI.




