NVIDIA puts the Vera Rubin platform into production with seven new chips

VR200 GPU pairs 288 GB of HBM4 with 50 PFLOPS of FP4 compute, and the Vera Rubin NVL72 rack claims a 3.3x throughput jump over Blackwell Ultra B300.

What's new

At GTC 2026 on March 16, NVIDIA announced the Vera Rubin platform in full production, anchored on the VR200 GPU and the new Vera Arm-based CPU. The VR200 is built on TSMC 3 nm, packs roughly 336 billion transistors across a dual reticle-sized compute die, and ships with 288 GB of HBM4 at 22 TB/s of memory bandwidth, delivering 50 PFLOPS of FP4 dense compute. NVIDIA describes that as a 3.3x per-chip throughput uplift over Blackwell Ultra B300. The platform also encompasses six other newly produced parts: the Vera CPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, Spectrum-6 Ethernet switch, and the Groq 3 LPU inference accelerator that NVIDIA inherited through its Groq acquisition. The reference rack-scale configuration is the Vera Rubin NVL72, which couples 72 VR200 GPUs to 36 Vera CPUs in a single liquid-cooled rack. Partner systems are scheduled for the second half of 2026. Source: NVIDIA Newsroom, "NVIDIA Vera Rubin Opens Agentic AI Frontier."

Why it matters

Vera Rubin is the first NVIDIA generation built around HBM4 and NVLink 6 as the rack-level fabric, and it close to doubles the per-package memory the Hopper era operated under. The 22 TB/s bandwidth target represents a meaningful step over the GB300 in the same envelope, which directly relaxes the long-context attention and KV-cache-bound regimes that increasingly dominate frontier inference. The platform-level claim is more ambitious: NVIDIA is positioning the seven-chip integration, GPU plus CPU plus two switches plus DPU plus NIC plus LPU, as a single rack-scale product. That mirrors the systems-first pattern Google has shipped with TPU pods and that AMD is converging on with the Helios reference design. It also folds the inherited Groq LPU into the inference path of the same fabric, which is the first time NVIDIA has shipped a non-GPU acceleration tile inside one of its own racks.

Caveats

Throughput claims are NVIDIA first-party, measured on its own software stack, with no MLPerf or third-party reproduction yet. The 22 TB/s VR200 bandwidth target was reached only after late silicon and HBM4 supplier revisions; earlier briefings advertised the figure under different SKU assumptions, and the public roadmap has shifted as a result. General availability is "second half of 2026" without a specific date, and partner pricing has not been disclosed. Power delivery, cooling, and per-rack thermal envelopes for the NVL72 form factor have not been published in detail, and customers building out at scale will be sensitive to the fact that VR200 racks step into a power class beyond GB300. Source: NVIDIA Newsroom, March 16, 2026.