At Google Cloud Next, Google unveiled its eighth-generation AI accelerators: the TPU v8t “Sunfish” for training and TPU v8i “Zebrafish” for inference, alongside the new Virgo data center fabric. Tailored for the agentic AI era, these chips are optimized for large mixture-of-experts (MoE) model training and low-latency token serving with cost-efficient pricing. While sharing the same host platform and interconnect fabric, v8t and v8i differ in memory, SRAM, topology and hardware specialization.
A v8t superpod supports 9,600 chips with 2 PB HBM and delivers 121 EFLOPS of FP4 compute, nearly triple the performance of the prior Ironwood generation. The v8i scales to 1,152 chips with 288 GB HBM and 384 MB on-chip SRAM, offering 80% better inference cost-efficiency than Ironwood. Virgo fabric interconnects over 134,000 v8t chips, providing 47 Pb/s non-blocking bandwidth with 4× higher per-accelerator throughput and 40% lower latency.
Fundamental TPU Architecture vs GPU
TPUs are custom ASICs characterized by large matrix multiply units (MXUs), software-managed SRAM and ahead-of-time compilation. Unlike GPU’s dynamic small-core scheduling, TPUs feature deterministic dataflow with systolic arrays, eliminating cache jitter and warp scheduling overhead for higher FLOPS utilization on dense matrix workloads. However, TPUs struggle with dynamic shapes, irregular sparsity and complex graph networks, while also offering narrower software ecosystem support dominated by JAX and XLA.
The structural difference in sparsity support clearly distinguishes TPUs and GPUs. NVIDIA Tensor Cores natively support 2:4 structured sparsity via instruction-level compression. In contrast, TPU systolic arrays operate in rigid lockstep, making zero-skipping inefficient without pipeline stalls or extra decompression hardware. AWS Trainium2 adopts a middle ground with dedicated sparse decompressors to retain array throughput.
TPUs integrate SparseCores to handle irregular gather-scatter tasks for embedding tables and MoE routing. These specialized cores excel at sorting, permutation and data rearrangement, covering recommendation workloads and expert token dispatching that standard MXUs cannot process efficiently.
TPU v8t “Sunfish”: Training Accelerator
The v8t training chip equips 216 GB HBM3e memory and 128 MB SRAM. Native FP4 precision doubles per-cycle throughput, pushing single-chip compute to 12.6 PFLOPS. It retains a 3D torus interconnect and upgraded 19.2 Tb/s ICI bandwidth, ideal for ring-based collective communications in large-scale training.
Inherited SparseCores optimize MoE irregular all-to-all data transmission. Two critical upgrades break large-scale bottlenecks: TPUDirect RDMA and TPUDirect Storage bypass the host CPU to enable direct TPU memory access, delivering 10× faster I/O throughput. Additionally, v8t adopts Google’s Arm-based Axion CPUs as host processors, isolating host jitter and enhancing preprocessing stability for synchronized multi-chip training.
TPU v8i “Zebrafish”: Inference Accelerator
Built for memory-bandwidth-bound inference workloads, v8i prioritizes low-latency token generation. It features 384 MB SRAM — triple that of Ironwood — to cache KV cache on-chip and reduce repeated HBM reading. With two TensorCores and 288 GB HBM3e, it achieves 10.1 PFLOPS FP4 compute, overlapping short-batch inference tasks for higher sustained utilization.
Replacing SparseCores, the dedicated Collectives Acceleration Engine (CAE) cuts on-chip synchronization latency by up to 5×, optimizing frequent small-batch collective operations. The v8i abandons 3D torus for the Dragonfly-based Boardfly topology, reducing maximum chip-to-chip hops from 16 to 7 and lowering MoE all-to-all latency by 50%.
Virgo & Jupiter Fabric Hierarchy
Virgo serves as the intra-data-center scale-out fabric, adopting a two-layer non-blocking architecture to eliminate oversubscription for east-west AI traffic. Powered by MEMS optical switches, it enables millisecond-level fault rerouting and maintains 97% goodput for v8t superpods. Combined with Jupiter — Google’s long-distance cross-data-center fabric — the layered interconnect system supports over one million TPU chips in a single logical cluster with 1.7 ZFLOPS total FP4 compute.
Performance, TCO and Market Position
High goodput and stable Model FLOPs Utilization (MFU) grant TPUs compelling cost advantages. At 40% MFU, TPU training costs are 62% lower than NVIDIA GB300. In hardware comparison, v8t dense FP4 performance sits between GB200 and GB300, while Google dominates in large-scale clustering with a 9,600-chip single pod, far exceeding NVIDIA’s 72-GPU NVLink domain.
Looking ahead, NVIDIA’s Vera Rubin, Rubin Ultra and Kyber will narrow TPU’s performance gap from 2026 to 2027. TPU’s weaknesses include smaller per-chip HBM, absent hardware sparsity and limited ecosystem compatibility. Nonetheless, Google maintains strengths in massive clustering, deterministic latency and cost efficiency for MoE workloads.
Google is expanding both TPU and NVIDIA GPU infrastructure. Meta plans a multi-billion-dollar TPU adoption deal starting in 2027. As a dual-chip generation optimized for the agentic era, TPU v8 secures Google’s competitiveness against NVIDIA Grace-Blackwell for frontier large-scale AI deployment.
Beijing Qianxing Jietong Technology Co., Ltd.
Sandy Yang/Global Strategy Director
WhatsApp / WeChat: +86 13426366826
Email: yangyd@qianxingdata.com
Website: www.qianxingdata.com/www.storagesserver.com
Business Focus:
ICT Product Distribution/System Integration & Services/Infrastructure Solutions
With 20+ years of IT distribution experience, we partner with leading global brands to deliver reliable products and professional services.
“Using Technology to Build an Intelligent World”Your Trusted ICT Product Service Provider!
Sandy Yang/Global Strategy Director
WhatsApp / WeChat: +86 13426366826
Email: yangyd@qianxingdata.com
Website: www.qianxingdata.com/www.storagesserver.com
Business Focus:
ICT Product Distribution/System Integration & Services/Infrastructure Solutions
With 20+ years of IT distribution experience, we partner with leading global brands to deliver reliable products and professional services.
“Using Technology to Build an Intelligent World”Your Trusted ICT Product Service Provider!



