optical_io

by Andy Sun (achoenix@gmail.com)

Last update: 2025/10/04

License & Citation


NVIDIA System

Design principles

GPU & System comparison

GPU comparison

Specification H100 B100 B200 B300 Rubin Rubin Ultra
Architecture Hopper Blackwell Blackwell Blackwell Ultra Rubin Rubin Ultra
Process Node TSMC 4N TSMC 4NP TSMC 4NP TSMC 4NP Not specified Not specified
Chip Packaging Monolithic Dual GB100 dies with NV-HBI 2 GB100 dies with NV-HBI 2 GB100 dies with NV-HBI 2 Rubin dies with NV-HBI 4 Rubin dies with NV-HBI
FP4 Tensor Not supported 7 PFLOPS (dense)
14 PFLOPS (sparse)
10 PFLOPS (dense)
20 PFLOPS (sparse)
15 PFLOPS (dense)
30 PFLOPS (sparse)
50 PFLOPS 100 PFLOPS
INT8 Tensor 0.4 P-OPS 3.5 P-OPS (dense)
7 P-OPS (sparse)
5 P-OPS (dense)
9 P-OPS (sparse)
7.5 P-OPS (dense), 15 P-OPS (sparse) Not specified Not specified
FP16 Tensor 0.2 PFLOPS 1.85 PFLOPS (dense)
3.5 PFLOPS (sparse)
2.5 PFLOPS (dense)
4.5 PFLOPS (sparse)
3.75 PFLOPS (dense)
7.5 PFLOPS (sparse)
Not specified Not specified
FP64 34 TFLOPS 30 TFLOPS 40 TFLOPS 68 TFLOPS Not specified Not specified
Memory 80 GB HBM3 192 GB HBM3e 192 GB HBM3e 288 GB HBM3e 288 GB HBM4 1 TB HBM4e
Memory Bandwidth 3.2 TB/s Up to 8 TB/s (4Tb/s /die) Up to 8 TB/s (4Tb/s /die) Up to 8 TB/s (4Tb/s /die) 8 TB/s /die 8 TB/s /die
Power (TDP) 700W 700W 1,000W Not specified Not specified Not specified

Note:

GPU rack system comparison

Specification H100 NVL8 Blackwell NVL72 Blackwell Ultra NVL72 Rubin NVL144 Rubin Ultra NVL576
Number of Trays N/A 36 36 36 36
GPU H100 B100/B200 B300 Rubin Rubin Ultra
Number of GPUs 8 72 (144 dies) 72 (144 dies) 144 dies 576 dies
CPU Grace Grace Grace Vera Vera
Number of CPUs N/A 18/36 36 36? 36?
CPU-GPU Bandwidth TB/s TB/s TB/s 1.8 TB/s 1.8 TB/s
Memory Capacity per GPU 94 GB HBM3 192 GB HBM3e 288 GB HBM3e 520 GB HBM4 1 TB HBM4e
Total Memory Capacity 752 GB 13.824 TB 20.736 TB 75 TB 365 TB (inconsistent with 1TB*144)
Memory Bandwidth per GPU 3.9 TB/s 8 TB/s 8 TB/s 13 TB/s ? TB/s
Total Memory Bandwidth 31.2 TB/s 576 TB/s 576 TB/s ? TB/s 4500 TB/s
Scale-Up Interconnect NVLink 4 NVLink 5 NVLink 5 NVLink 6 NVLink 7
Scale-Up Interconnect BW per GPU 900 GB/s (18x4x100Gb/s) 1.8 TB/s (18x4x200Gb/s) 1.8 TB/s 3.6 TB/s (18x4x400Gb/s?) ? TB/s (?)
Total Scale-Up Bandwidth 7.2 TB/s 129.6 TB/s 129.6 TB/s 260 TB/s 1.5 PB/s
Server NIC     ConnectX-8 (800Gb/s) ConnectX-9 (1.6Tb/s) ConnectX-9 (1.6T/s)
Scale-Out Interconnect Not specified InfiniBand/Ethernet InfiniBand/Ethernet InfiniBand/Ethernet InfiniBand/Ethernet
Total Scale-Out Bandwidth TB/s TB/s 14.4 TB/s (144x800Gb/s) 28.8 TB/s (144x1.6Tb/s) 115.2 TB/s (576x1.6Tb/s?)
Total Rank System Power 60 kW 72-120 kW ? ? 600 kW
FP4 Dense Inference PLFOPS PFLOPS 1.1 EFLOPS 3.6 EFLOPS 15 EFLOPS
FP8 Training PLFOPS PFLOPS 0.36 EFLOPS 1.2 EFLOPS 5 EFLOPS

Note:

GB200 NVL72 system

NVL72 system comes with different designs

System design MGX HGX Note
# of CPU/server 2 1  
# of GPU/server 4 4  
fully connnected HBM within rack Yes No  

NVIDIA CPO Switch

GTC 2025 Keynote

Note

Quantum 3450-LD Switch

Feature

Available in 2025H2

Spectrum SN6810

Feature

Spectrum SN6800

Feature

Information

GTC 2025

Reference