Accelerated Computing
FIELD GUIDE
§ 01 / FEATURE

The quiet revolution
in how machines think.

For decades the CPU sat at the center of computing. A second processor, born for graphics, has rewritten that center of gravity. This is the field guide.

Read the explainer →Latest news
~10³×
speedup over CPU on dense FP16 matrix workloads
20 PFLOPS
FP4 sparse, in a flagship 2025 accelerator card
$115B
datacenter revenue at the top vendor, fiscal 2025
§ 03
Demo / CPU vs GPU

One core, in a hurry.
Many cores, in unison.

Both processors are computing the same 2048×2048 matrix multiplication. The CPU walks the grid one tile at a time. The GPU lights every tile at once. Press play.

CPU
8 cores · 4.2 GHz
0%
tiles processed0 / 256
GPU
16,896 cores · 1.8 GHz
0%
tiles processed0 / 256
§ 02
The field guide

Seven sections. Pick where to start.

Each section is its own deep dive. The explainer is the gentlest entry point; the hardware taxonomy and benchmarks chart are the most opinionated.

§ 02

What is accelerated computing?

Four questions, answered honestly: what it is, why a CPU is not enough, whether it is just GPUs, and when it stops being worth it.

Read the explainer
§ 04

Anatomy of an accelerator

An interactive diagram of host CPU, PCIe, the on-device scheduler, streaming multiprocessors, HBM, and L2 cache.

See the diagram
§ 05

Where it shows up

Six fields that look different now than they did a decade ago: AI training, simulation, graphics, genomics, finance, and edge inference.

Browse the use cases
§ 06

GPU vs TPU vs NPU vs FPGA

A side-by-side taxonomy of the four accelerator families, ranked on flexibility, throughput, and power.

Compare the hardware
§ 07

The lines that diverge

Eighteen years of FP16 throughput on CPUs and GPUs. The CPU has roughly tripled. The accelerator has improved by three orders of magnitude.

See the chart
§ 08

Twenty-five years, in seven beats

From programmable shaders to NPUs in every laptop and phone — the history of how the second processor took center stage.

Read the timeline
§ 10

The terms

The vocabulary an engineer uses to talk about accelerated systems, kept short and operational. CUDA, HBM, MFU, tensor cores, systolic arrays, and the rest.

Open the glossary
§ 09
The wire

News this week

All news →
Hardware

Intel details Crescent Island data center GPU with 480 GB of LPDDR5XIntel's Xe3P inference GPU carries up to 480 GB of LPDDR5X in a 350W air-cooled PCIe card and adds native FP4 and MXFP4 support, with sampling in H2 2026.

Intel · 2 days ago4 min
Edge

NVIDIA unveils RTX Spark, a Grace-plus-Blackwell superchip with 128 GB unified memoryNVIDIA's RTX Spark pairs a 20-core Grace CPU with a 6,144-core Blackwell GPU over NVLink-C2C, claiming 1 petaflop of AI compute for on-device agents.

NVIDIA · 3 days ago4 min
Hardware

Phoronix benchmarks NVIDIA Vera CPU against Intel and AMD x86 flagshipsFirst independent Vera review shows a 1.5x geomean lead over a 128-core x86 processor and a 1.6x gain over Grace, with 1.2 TB/s of LPDDR5X bandwidth.

NVIDIA · 1 week ago4 min
Hardware

AMD launches Instinct MI350P PCIe with 144GB HBM3EThe MI350P drops AMD's MI350-class compute into a dual-slot PCIe card aimed at on-prem inference, claiming up to 4,600 TFLOPS at MXFP4.

AMD · 4 weeks ago4 min
Hardware

AMD previews Instinct MI430X with 200+ FP64 TFLOPSAMD projects its upcoming HPC accelerator at over 200 FP64 TFLOPS, more than six times its claim for NVIDIA Rubin.

AMD · 4 weeks ago4 min
Hardware

Tenstorrent ships Galaxy Blackhole, its 32-chip AI serverEach 6U Galaxy node packs 32 Blackhole accelerators, 1 TB of GDDR6, and 23 PFLOPS of FP8, listing at $110,000 per system.

Tenstorrent · 1 month ago4 min