KAIST´s Memory Systems Laboratory and TERA´s Interconnection and Packaging group have jointly set forth an ambitious roadmap for the evolution of High Bandwidth Memory (HBM), shedding light on five generations to come, from HBM4 through HBM8. Unveiled during a collaborative briefing and subsequently reported by tech outlets, the roadmap paints a future where memory stacks deliver immense bandwidth and capacities, redefining the possibilities for Artificial Intelligence GPUs and data center accelerators. HBM4, set for a 2026 debut, will provide about 2 TB/s of bandwidth per stack, with an 8 Gbps pin rate and a 2,048-bit memory interface. Die stacks are expected to reach as high as 16 layers, supporting 36 to 48 GB per package under a manageable 75 W power ceiling. On the product front, NVIDIA´s Rubin series and AMD´s Instinct MI500 are expected to adopt HBM4, with the former doubling up to 16 memory stacks in certain variants and the latter potentially packing up to 432 GB of memory in a single device.
Progressing to 2029, the HBM5 era will retain the 8 Gbps speed but double memory interface width to 4,096 bits, resulting in 4 TB/s throughput per stack and 80 GB of capacity per package using 16-high 40 Gb dies. Power consumption rises to 100 W per stack as demands escalate. NVIDIA´s planned Feynman accelerator is poised to be an early beneficiary of HBM5, combining as much as 400-500 GB of HBM memory in a multi-die configuration; total accelerator power exceeds 4,400 W, highlighting intensifying performance and cooling challenges as the roadmap unfolds. HBM6, forecast for 2032, will double the pin speed to 16 Gbps over the same 4,096-lane interface, propelling bandwidth to 8 TB/s and density to 120 GB per stack with 20 stacked dies, consuming 120 W each. Innovative cooling strategies like immersion cooling and direct copper-copper interconnections will become necessary at these scales.
Farther out, HBM7 slated for 2035 marks a dramatic leap: 24 Gbps speed, a massive 8,192-bit interface, and up to 24 TB/s per stack — with individual stacks housing as much as 192 GB at 160 W. NVIDIA is said to be preparing next-gen accelerators with power requirements around 15,360 W to exploit this towering performance tier. Each generational step not only ratchets up aggregate memory throughput but also raises the bar for packaging complexity and power dissipation, fueling an arms race in server and Artificial Intelligence infrastructure design — with explicit implications for next-generation cooling, die stacking, and interconnection technologies. The roadmap underscores just how crucial memory evolution will be in sustaining the accelerating compute appetites of Artificial Intelligence and high-performance computing for the next decade.