logo
Home Cases

Google Announces TPU v8t Sunfish and TPU v8i Zebrafish

Certification
China Beijing Qianxing Jietong Technology Co., Ltd. certification
China Beijing Qianxing Jietong Technology Co., Ltd. certification
Customer Reviews
The sales staff of Beijing Qianxing Jietong Technology Co.,Ltd are very professional and patient. They can provide quotations quickly. The quality and packaging of the products are also very good. Our cooperation is very smooth.

—— 《Festfing DV》LLC

When I was looking for intel CPU and Toshiba SSD urgently, Sandy from Beijing Qianxing Jietong Technology Co., Ltd gave me a lot of help and got me the products I needed quickly. I really appreciate her.

—— Kitty Yen

Sandy of Beijing Qianxing Jietong Technology Co.,Ltd is a very careful salesman, who can remind me of configuration errors in time when I buy a server. The engineers are also very professional and can quickly complete the testing process.

—— Strelkin Mikhail Vladimirovich

We are very happy with our experience working with Beijing Qianxing Jietong. The product quality is excellent, and delivery is always on time. Their sales team is professional, patient, and very helpful with all our questions. We truly appreciate their support and look forward to a long-term partnership. Highly recommended!

—— Ahmad Navid

Quality: “Great experience with my supplier. The MikroTik RB3011 was already used, but it was in very good condition and everything works perfectly. Communication was fast and smooth, and all my concerns were addressed quickly. Very reliable supplier—highly recommended.”

—— Geran Colesio

I'm Online Chat Now

Google Announces TPU v8t Sunfish and TPU v8i Zebrafish

May 11, 2026
At Google Cloud Next, Google unveiled its eighth-generation AI accelerators: the TPU v8t “Sunfish” for training and TPU v8i “Zebrafish” for inference, alongside the new Virgo data center fabric. Tailored for the agentic AI era, these chips are optimized for large mixture-of-experts (MoE) model training and low-latency token serving with cost-efficient pricing. While sharing the same host platform and interconnect fabric, v8t and v8i differ in memory, SRAM, topology and hardware specialization.

latest company case about Google Announces TPU v8t Sunfish and TPU v8i Zebrafish  0

A v8t superpod supports 9,600 chips with 2 PB HBM and delivers 121 EFLOPS of FP4 compute, nearly triple the performance of the prior Ironwood generation. The v8i scales to 1,152 chips with 288 GB HBM and 384 MB on-chip SRAM, offering 80% better inference cost-efficiency than Ironwood. Virgo fabric interconnects over 134,000 v8t chips, providing 47 Pb/s non-blocking bandwidth with 4× higher per-accelerator throughput and 40% lower latency.

Fundamental TPU Architecture vs GPU


TPUs are custom ASICs characterized by large matrix multiply units (MXUs), software-managed SRAM and ahead-of-time compilation. Unlike GPU’s dynamic small-core scheduling, TPUs feature deterministic dataflow with systolic arrays, eliminating cache jitter and warp scheduling overhead for higher FLOPS utilization on dense matrix workloads. However, TPUs struggle with dynamic shapes, irregular sparsity and complex graph networks, while also offering narrower software ecosystem support dominated by JAX and XLA.

The structural difference in sparsity support clearly distinguishes TPUs and GPUs. NVIDIA Tensor Cores natively support 2:4 structured sparsity via instruction-level compression. In contrast, TPU systolic arrays operate in rigid lockstep, making zero-skipping inefficient without pipeline stalls or extra decompression hardware. AWS Trainium2 adopts a middle ground with dedicated sparse decompressors to retain array throughput.

TPUs integrate SparseCores to handle irregular gather-scatter tasks for embedding tables and MoE routing. These specialized cores excel at sorting, permutation and data rearrangement, covering recommendation workloads and expert token dispatching that standard MXUs cannot process efficiently.

TPU v8t “Sunfish”: Training Accelerator


The v8t training chip equips 216 GB HBM3e memory and 128 MB SRAM. Native FP4 precision doubles per-cycle throughput, pushing single-chip compute to 12.6 PFLOPS. It retains a 3D torus interconnect and upgraded 19.2 Tb/s ICI bandwidth, ideal for ring-based collective communications in large-scale training.

Inherited SparseCores optimize MoE irregular all-to-all data transmission. Two critical upgrades break large-scale bottlenecks: TPUDirect RDMA and TPUDirect Storage bypass the host CPU to enable direct TPU memory access, delivering 10× faster I/O throughput. Additionally, v8t adopts Google’s Arm-based Axion CPUs as host processors, isolating host jitter and enhancing preprocessing stability for synchronized multi-chip training.

latest company case about Google Announces TPU v8t Sunfish and TPU v8i Zebrafish  1

TPU v8i “Zebrafish”: Inference Accelerator


Built for memory-bandwidth-bound inference workloads, v8i prioritizes low-latency token generation. It features 384 MB SRAM — triple that of Ironwood — to cache KV cache on-chip and reduce repeated HBM reading. With two TensorCores and 288 GB HBM3e, it achieves 10.1 PFLOPS FP4 compute, overlapping short-batch inference tasks for higher sustained utilization.

Replacing SparseCores, the dedicated Collectives Acceleration Engine (CAE) cuts on-chip synchronization latency by up to 5×, optimizing frequent small-batch collective operations. The v8i abandons 3D torus for the Dragonfly-based Boardfly topology, reducing maximum chip-to-chip hops from 16 to 7 and lowering MoE all-to-all latency by 50%.

Virgo & Jupiter Fabric Hierarchy


Virgo serves as the intra-data-center scale-out fabric, adopting a two-layer non-blocking architecture to eliminate oversubscription for east-west AI traffic. Powered by MEMS optical switches, it enables millisecond-level fault rerouting and maintains 97% goodput for v8t superpods. Combined with Jupiter — Google’s long-distance cross-data-center fabric — the layered interconnect system supports over one million TPU chips in a single logical cluster with 1.7 ZFLOPS total FP4 compute.

Performance, TCO and Market Position


High goodput and stable Model FLOPs Utilization (MFU) grant TPUs compelling cost advantages. At 40% MFU, TPU training costs are 62% lower than NVIDIA GB300. In hardware comparison, v8t dense FP4 performance sits between GB200 and GB300, while Google dominates in large-scale clustering with a 9,600-chip single pod, far exceeding NVIDIA’s 72-GPU NVLink domain.

Looking ahead, NVIDIA’s Vera Rubin, Rubin Ultra and Kyber will narrow TPU’s performance gap from 2026 to 2027. TPU’s weaknesses include smaller per-chip HBM, absent hardware sparsity and limited ecosystem compatibility. Nonetheless, Google maintains strengths in massive clustering, deterministic latency and cost efficiency for MoE workloads.

Google is expanding both TPU and NVIDIA GPU infrastructure. Meta plans a multi-billion-dollar TPU adoption deal starting in 2027. As a dual-chip generation optimized for the agentic era, TPU v8 secures Google’s competitiveness against NVIDIA Grace-Blackwell for frontier large-scale AI deployment.

Beijing Qianxing Jietong Technology Co., Ltd.
Sandy Yang/Global Strategy Director
WhatsApp / WeChat: +86 13426366826
Email: yangyd@qianxingdata.com
Website: www.qianxingdata.com/www.storagesserver.com
Business Focus:
ICT Product Distribution/System Integration & Services/Infrastructure Solutions
With 20+ years of IT distribution experience, we partner with leading global brands to deliver reliable products and professional services.
“Using Technology to Build an Intelligent World”Your Trusted ICT Product Service Provider!
Contact Details
Beijing Qianxing Jietong Technology Co., Ltd.

Contact Person: Ms. Sandy Yang

Tel: 13426366826

Send your inquiry directly to us (0 / 3000)