AMD launches Instinct MI350P PCIe with 144GB HBM3E

The MI350P drops AMD's MI350-class compute into a dual-slot PCIe card aimed at on-prem inference, claiming up to 4,600 TFLOPS at MXFP4.

What's new

On May 7, 2026, AMD announced the Instinct MI350P PCIe GPU in a blog post titled "AMD Instinct MI350P PCIe GPUs: Run Enterprise AI on Your Existing Infrastructure," authored by Suresh Andani and AMD News. The card carries 144 GB of HBM3E with up to 4 TB/s of memory bandwidth on a dual-slot PCIe form factor designed to drop into standard air-cooled enterprise servers. AMD cites estimated peak throughput of 2,299 TFLOPS at base precision and up to 4,600 TFLOPS at MXFP4. Native support extends to FP8, MXFP8, MXFP6, MXFP4, INT8, and BF16, with sparsity acceleration for the 8- and 16-bit precisions. Systems can host up to eight MI350P cards. AMD names Dell Technologies, HPE, Cisco, Lenovo, Supermicro, and Gigabyte as launch OEM partners, and lists Red Hat, Broadcom's VMware, Akamai, Nutanix, Uniphore, Kamiwaza, and Seekr among software partners. Source: AMD blog, "AMD Instinct MI350P PCIe GPUs: Run Enterprise AI on Your Existing Infrastructure," May 7, 2026.

Why it matters

The MI350P is AMD's first PCIe-form-factor Instinct card since the MI210 in 2022, and it brings MI350-class compute and 144 GB of HBM3E into server chassis that cannot accept the OAM modules used by MI300-series and MI355X parts. The combination makes the card a candidate for on-prem inference of medium-sized models without standing up a new liquid-cooled GPU pod or moving workloads to a public cloud. Native MXFP4 and MXFP6 support tracks the broader industry shift toward sub-byte numerics for inference economics, putting AMD on closer footing with NVIDIA's Blackwell-class FP4 throughput story. The eight-card-per-server configuration and dual-slot air-cooled envelope are explicitly aimed at enterprises that want capacity expansion within an existing power and cooling footprint rather than a data-center redesign. Pulling MI300X-class memory capacity onto a PCIe card also closes a gap that had pushed many enterprise inference deployments toward NVIDIA H200 NVL.

Caveats

The 4,600 TFLOPS MXFP4 figure is an AMD peak estimate, and the post flags footnote GD-247a noting "preliminary performance estimates based on AMD engineering projections or early measurements as of April 2026," not a measured result on shipping silicon. The headline FP8 and BF16 figures rely on sparsity, which not all production inference paths exploit. AMD has not disclosed general availability date, list pricing, board power, PCIe generation, or peer-to-peer connectivity options in the announcement. Independent benchmarks, supported-OS coverage, and software-stack maturity for production inference workloads remain to be verified. Source: AMD, May 7, 2026.