TPU

Google’s AI accelerator is the Tensor Processing Unit (TPU).

Comparison

Memory

On-chip memory (MiB)	Memory	Memory bandwidth
28	8 GiB DDR3	34 GB/s
32	16 GiB HBM	600 GB/s
32 (VMEM) + 5 (spMEM)	32 GiB HBM	900 GB/s
128 (CMEM) + 32 (VMEM) + 10 (spMEM)	32 GiB HBM	1200 GB/s
48	16 GB HBM	819 GB/s
112	95 GB HBM	2765 GB/s
	32 GB	1640 GB/s

Performance

Clock speed (MHz)	TOPS	TDP (W)	TOPS/W
700	23	75	0.31
700	45	280	0.16
940	123	220	0.56
1050	275	170	1.62
	197 (bf16) 393 (int8)
1750	459 (bf16) 918 (int8)
	918 (bf16) 1836 (int8)

TPUv1

Manufactured in 2015 on a 28 nm process with a die size of $\leq 331 mm^{2}$ .

TPUv2

Manufactured in 2017 on a 16 nm process with a die size of $\leq 625 mm^{2}$ .

TPUv3

Manufactured in 2018 on a 16 nm process with a die size of $\leq 700 mm^{2}$ .

TPUv4

Manufactured in 2021 on a 7 nm process with a die size of $\leq 400 mm^{2}$ .

TPUv5e

Manufactured in 2023 on an unstated process with a die size of $300 - 350 mm^{2}$ .

TPUv5p

Manufactured in 2023 on an instated process with an unstated die size.

TPUv6e

Manufactured in 2024 on an instated process with an unstated die size. These TPUs are also known as Trillium TPUs. Google claimed a x4.7 performance increase relative to TPUv5e, via larger matrix multiplication units and an increased clock speed. HBM capacity and bandwidth have also doubled. A pod can contain up to 256 Trillium units.

References

Wikipedia

ML Notes

Explorer

TPU

Comparison

Memory

Performance

TPUv1

TPUv2

TPUv3

TPUv4

TPUv5e

TPUv5p

TPUv6e

Graph View

Table of Contents

Backlinks