Last update: 2025/10/04
Specification | H100 | B100 | B200 | B300 | Rubin | Rubin Ultra |
---|---|---|---|---|---|---|
Architecture | Hopper | Blackwell | Blackwell | Blackwell Ultra | Rubin | Rubin Ultra |
Process Node | TSMC 4N | TSMC 4NP | TSMC 4NP | TSMC 4NP | Not specified | Not specified |
Chip Packaging | Monolithic | Dual GB100 dies with NV-HBI | 2 GB100 dies with NV-HBI | 2 GB100 dies with NV-HBI | 2 Rubin dies with NV-HBI | 4 Rubin dies with NV-HBI |
FP4 Tensor | Not supported | 7 PFLOPS (dense) 14 PFLOPS (sparse) |
10 PFLOPS (dense) 20 PFLOPS (sparse) |
15 PFLOPS (dense) 30 PFLOPS (sparse) |
50 PFLOPS | 100 PFLOPS |
INT8 Tensor | 0.4 P-OPS | 3.5 P-OPS (dense) 7 P-OPS (sparse) |
5 P-OPS (dense) 9 P-OPS (sparse) |
7.5 P-OPS (dense), 15 P-OPS (sparse) | Not specified | Not specified |
FP16 Tensor | 0.2 PFLOPS | 1.85 PFLOPS (dense) 3.5 PFLOPS (sparse) |
2.5 PFLOPS (dense) 4.5 PFLOPS (sparse) |
3.75 PFLOPS (dense) 7.5 PFLOPS (sparse) |
Not specified | Not specified |
FP64 | 34 TFLOPS | 30 TFLOPS | 40 TFLOPS | 68 TFLOPS | Not specified | Not specified |
Memory | 80 GB HBM3 | 192 GB HBM3e | 192 GB HBM3e | 288 GB HBM3e | 288 GB HBM4 | 1 TB HBM4e |
Memory Bandwidth | 3.2 TB/s | Up to 8 TB/s (4Tb/s /die) | Up to 8 TB/s (4Tb/s /die) | Up to 8 TB/s (4Tb/s /die) | 8 TB/s /die | 8 TB/s /die |
Power (TDP) | 700W | 700W | 1,000W | Not specified | Not specified | Not specified |
Note:
Specification | H100 NVL8 | Blackwell NVL72 | Blackwell Ultra NVL72 | Rubin NVL144 | Rubin Ultra NVL576 |
---|---|---|---|---|---|
Number of Trays | N/A | 36 | 36 | 36 | 36 |
GPU | H100 | B100/B200 | B300 | Rubin | Rubin Ultra |
Number of GPUs | 8 | 72 (144 dies) | 72 (144 dies) | 144 dies | 576 dies |
CPU | Grace | Grace | Grace | Vera | Vera |
Number of CPUs | N/A | 18/36 | 36 | 36? | 36? |
CPU-GPU Bandwidth | TB/s | TB/s | TB/s | 1.8 TB/s | 1.8 TB/s |
Memory Capacity per GPU | 94 GB HBM3 | 192 GB HBM3e | 288 GB HBM3e | 520 GB HBM4 | 1 TB HBM4e |
Total Memory Capacity | 752 GB | 13.824 TB | 20.736 TB | 75 TB | 365 TB (inconsistent with 1TB*144) |
Memory Bandwidth per GPU | 3.9 TB/s | 8 TB/s | 8 TB/s | 13 TB/s | ? TB/s |
Total Memory Bandwidth | 31.2 TB/s | 576 TB/s | 576 TB/s | ? TB/s | 4500 TB/s |
Scale-Up Interconnect | NVLink 4 | NVLink 5 | NVLink 5 | NVLink 6 | NVLink 7 |
Scale-Up Interconnect BW per GPU | 900 GB/s (18x4x100Gb/s) | 1.8 TB/s (18x4x200Gb/s) | 1.8 TB/s | 3.6 TB/s (18x4x400Gb/s?) | ? TB/s (?) |
Total Scale-Up Bandwidth | 7.2 TB/s | 129.6 TB/s | 129.6 TB/s | 260 TB/s | 1.5 PB/s |
Server NIC | ConnectX-8 (800Gb/s) | ConnectX-9 (1.6Tb/s) | ConnectX-9 (1.6T/s) | ||
Scale-Out Interconnect | Not specified | InfiniBand/Ethernet | InfiniBand/Ethernet | InfiniBand/Ethernet | InfiniBand/Ethernet |
Total Scale-Out Bandwidth | TB/s | TB/s | 14.4 TB/s (144x800Gb/s) | 28.8 TB/s (144x1.6Tb/s) | 115.2 TB/s (576x1.6Tb/s?) |
Total Rank System Power | 60 kW | 72-120 kW | ? | ? | 600 kW |
FP4 Dense Inference | PLFOPS | PFLOPS | 1.1 EFLOPS | 3.6 EFLOPS | 15 EFLOPS |
FP8 Training | PLFOPS | PFLOPS | 0.36 EFLOPS | 1.2 EFLOPS | 5 EFLOPS |
Note:
NVL72 system comes with different designs
System design | MGX | HGX | Note |
---|---|---|---|
# of CPU/server | 2 | 1 | |
# of GPU/server | 4 | 4 | |
fully connnected HBM within rack | Yes | No |
Note
Feature
Available in 2025H2
Feature
Feature