NVIDIA unveils RTX Spark, a Grace-plus-Blackwell superchip with 128 GB unified memory
NVIDIA's RTX Spark pairs a 20-core Grace CPU with a 6,144-core Blackwell GPU over NVLink-C2C, claiming 1 petaflop of AI compute for on-device agents.
What's new
At GTC Taipei on May 31, 2026, NVIDIA unveiled RTX Spark, a consumer superchip that combines a CPU and GPU on a single package for local AI work on Windows PCs. The GPU is a Blackwell RTX part with 6,144 CUDA cores and fifth-generation Tensor Cores that support FP4 precision. It connects over NVIDIA's NVLink-C2C chip-to-chip interconnect to a 20-core NVIDIA Grace CPU, whose custom design NVIDIA says was developed with MediaTek for power efficiency. The package carries up to 128 GB of unified memory shared between the CPU and GPU, and NVIDIA puts peak AI compute at 1 petaflop. NVIDIA frames the capacity and bandwidth as enough to run a 120-billion-parameter language model with up to 1 million tokens of context locally, edit 12K 4:2:2 video through the Blackwell decoder, render 90 GB 3D scenes with OptiX and DLSS, and play games at 1440p above 100 frames per second. The chip ships inside laptops as slim as 14 millimeters and as light as three pounds, plus compact desktops, from ASUS, Dell, HP, Lenovo, Microsoft Surface, and MSI this fall, with Acer and GIGABYTE to follow. NVIDIA paired the silicon with a software effort alongside Microsoft, including new Windows security primitives and an NVIDIA OpenShell runtime for running agents on device. Source: NVIDIA Newsroom, "NVIDIA and Microsoft Reinvent Windows PCs for the Age of Personal AI," May 31, 2026.
Why it matters
RTX Spark moves the Grace-plus-Blackwell pattern that NVIDIA built for the data center down into a laptop-class part, with unified memory as the central bet. A single 128 GB pool that both the CPU and GPU address lets a portable machine hold a 120-billion-parameter model and a long context window in memory, which is the constraint that usually pushes that work to a cloud GPU. The NVLink-C2C link between Grace and Blackwell is the same coherence approach NVIDIA uses in its server superchips, now in a consumer envelope, and the MediaTek collaboration on the Arm CPU signals NVIDIA is willing to design the host processor rather than buy it. The pitch lands squarely on the emerging market for on-device agents, where keeping model state and personal data local is the selling point, positioning RTX Spark against Apple silicon, Intel, and Qualcomm for the same workloads.
Caveats
NVIDIA's 1 petaflop figure is a peak number and the press release does not state the precision basis or whether sparsity is assumed, so sustained throughput at usable precision is unknown. The release does not publish memory bandwidth; show reporting put it at up to 300 GB/s, which is far below the HBM bandwidth of data center parts and would bound large-model token rates. The 120-billion-parameter and 1-million-token claims are NVIDIA's, with no independent benchmark of interactive speed at that scale. The 128 GB figure is the top configuration, and NVIDIA has not detailed the SKU range or pricing. Hardware ships in the fall, so timelines and final specs could change.