MI355X DLC<span class="credit">(Image credit: AMD)</span>

  • AMD highlighted MI350 series at Hot Chips 2025 with node-to-rack scalability
  • MI355X DLC rack features 128 GPUs 36TB HBM3e and 2.6 exaflops
  • Nvidia’s Vera Rubin system coming next year is a maximum scale beast

AMD[1] used the recent Hot Chips 2025 event to talk more about the CDNA 4 architecture which powers its new Instinct MI350 series, and show how its accelerators scale from node to rack.

The MI350 series platforms combine 5th Gen Epyc CPUs, MI350 GPUs, and AMD Pollara NICs in OCP-standard designs with UEC-supported networking. Bandwidth is delivered through Infinity Fabric at up to 1075GB/s.

At the top end of this is the MI355X DLC ‘Orv3’ rack, a 2OU system with 128 GPUs, 36TB of HBM3e memory, and peak throughput of 2.6 exaflops at FP4 precision (there’s also a 96-GPU EIA version with 27TB of HBM3e).

MI355X DLC

(Image credit: AMD)

Here comes Vera Rubin

At the node level, AMD presented flexible designs for both air-cooled and liquid-cooled systems.

An MI350X platform with 8 GPUs achieves 73.8 petaflops at FP8, while the liquid-cooled MI355X platform reaches 80.5 petaflops FP8 in a denser form factor.

AMD also confirmed its roadmap. The chip giant debuted the MI325X in 2024, the MI350 family arrived earlier in 2025, and the Instinct MI400 is set to make an appearance in 2026.

The MI400 will offer up to 40 petaflops FP4, 20 petaflops FP8, 432GB of HBM4 memory, 19.6TB/s bandwidth, and 300GB/s scale-out per GPU.

AMD says that the performance curve from MI300 to MI400 shows accelerated gains, rather than incremental steps.

The elephant in the room is, naturally enough, Nvidia[2], which is planning its Rubin architecture for 2026–27. The Vera Rubin NVL144 system, penciled in for the second half of next year, will (according to slides shared by Nvidia), be rated for 3.6 exaflops FP4 inference and 1.2 exaflops FP8 training. It features 13TB/s of HBM4 bandwidth and 75TB of fast memory, delivering a 1.6x gain over its predecessor.

Nvidia is integrating 88 custom Arm[3] CPU cores with 176 threads, connected by 1.8TB/s of NVLink-C2C, alongside 260TB/s of NVLink6 and 28.8TB/s of CX9 interconnect.

In late 2027, Nvidia has the Rubin Ultra NVL576 system planned. This will be rated for 15 exaflops FP4 inference and 5 exaflops FP8 training, with 4.6PB/s of HBM4e bandwidth, 365TB of fast memory, and interconnect speeds of 1.5PB/s with NVLink7 and 115TB/s using CX9.

At full scale, the Rubin system will include 576 GPUs, 2,304 HBM4e stacks totaling 150TB of memory, and 1,300 trillion transistors, supported by 12,672 Vera CPU cores, 576 ConnectX-9 NICs, 72 BlueField DPUs, and 144 NVLink switches rated at 1,500PB/s.

It’s a tightly integrated, monolithic beast aimed at maximum scale.

While it’s fun to compare the AMD and Nvidia numbers, it’s obviously not exactly fair. AMD’s MI355X DLC rack is a product being detailed in 2025, while Nvidia’s Rubin systems remain roadmap designs for 2026 and 2027. Even so, it’s an interesting glimpse into how each company is framing the next wave of AI infrastructure.

Nvidia Vera Rubin NVL144

(Image credit: Nvidia)

You might also like

References

  1. ^ AMD (www.techradar.com)
  2. ^ Nvidia (www.techradar.com)
  3. ^ Arm (www.techradar.com)

By admin