Google’s AI accelerator is the Tensor Processing Unit (TPU).

Comparison

Memory

On-chip memory (MiB)MemoryMemory bandwidth
288 GiB DDR334 GB/s
3216 GiB HBM600 GB/s
32 (VMEM) + 5 (spMEM)32 GiB HBM900 GB/s
128 (CMEM) + 32 (VMEM) + 10 (spMEM)32 GiB HBM1200 GB/s
4816 GB HBM819 GB/s
11295 GB HBM2765 GB/s
32 GB1640 GB/s

Performance

Clock speed (MHz)TOPSTDP (W)TOPS/W
70023750.31
700452800.16
9401232200.56
10502751701.62
197 (bf16) 393 (int8)
1750459 (bf16) 918 (int8)
918 (bf16) 1836 (int8)

TPUv1

Manufactured in 2015 on a 28 nm process with a die size of .

TPUv2

Manufactured in 2017 on a 16 nm process with a die size of .

TPUv3

Manufactured in 2018 on a 16 nm process with a die size of .

TPUv4

Manufactured in 2021 on a 7 nm process with a die size of .

TPUv5e

Manufactured in 2023 on an unstated process with a die size of .

TPUv5p

Manufactured in 2023 on an instated process with an unstated die size.

TPUv6e

Manufactured in 2024 on an instated process with an unstated die size. These TPUs are also known as Trillium TPUs. Google claimed a x4.7 performance increase relative to TPUv5e, via larger matrix multiplication units and an increased clock speed. HBM capacity and bandwidth have also doubled. A pod can contain up to 256 Trillium units.

References