Next-gen HBM eyes embedded GPU cores for Artificial Intelligence

Companies including Meta and NVIDIA are reported to be evaluating custom HBM that embeds GPU cores into the base die to push compute closer to memory for Artificial Intelligence workloads.

Tech companies are exploring a shift in high-bandwidth memory design that would embed GPU cores directly into the base die of next-generation HBM stacks. Reports name Meta and NVIDIA as evaluating so-called custom HBM architectures with SK Hynix and Samsung involved in early discussions. HBM stacks multiple DRAM dies atop a base die that handles external I/O, and HBM4 is expected to reach mass production next year with an onboard controller to improve bandwidth and efficiency. Placing compute inside the memory aims to reduce data movement and cut power usage by shortening the path between compute and memory for Artificial Intelligence workloads.

The approach promises performance and energy-efficiency gains for Artificial Intelligence processing but faces significant technical challenges. Sources cite limited die area in Through-Silicon Vias based stacks, power delivery constraints, and the difficulty of cooling compute-heavy GPU logic embedded in the base die. Kim Joung-ho, a professor in the School of Electrical Engineering at KAIST, said, ‘The speed of technological transition where the boundary between memory and system semiconductors collapses for Artificial Intelligence advancement will accelerate,’ and added, ‘Domestic companies must expand their ecosystem beyond memory into the logic sector to preempt the next-generation HBM market.’ The quote underscores industry pressure for memory vendors to broaden capabilities into packaging and logic.

Design choices by major accelerator makers illustrate divergent strategies. AMD’s Instinct MI430X accelerator, built on the next-generation AMD CDNA architecture, supports 432 GB of HBM4 memory and 19.6 TB/s of memory bandwidth. NVIDIA’s ‘Vera Rubin’ Superchip instead integrates two reticle-sized compute chiplets paired with eight HBM4 stacks, delivering around 288 GB of HBM4 per GPU and roughly 576 GB of HBM4 across the full Superchip. Market implications are clear: firms with strong packaging and logic capabilities stand to benefit, while pure memory vendors may need to expand into system-level semiconductor technologies to remain competitive.

60

Impact Score

GPUBreach bypasses IOMMU on GDDR6-based NVIDIA GPUs

Researchers from the University of Toronto describe GPUBreach, a rowhammer attack against GDDR6-based NVIDIA GPUs that can bypass IOMMU protections. The technique enables CPU-side privilege escalation by abusing trusted GPU driver behavior on the host system.

Google Vids opens free video generation to all Google users

Google has made Google Vids available to anyone with a Google account, adding free access to video generation with its latest models. The move expands Google’s end-to-end video workflow and increases pressure on rivals that charge for similar tools.

Court warns against chatbot legal advice in Heppner case

A federal court found that chats with a publicly available generative Artificial Intelligence tool were not protected by attorney-client privilege or the work-product doctrine. The ruling highlights litigation risks when executives or employees use chatbots for legal guidance without lawyer supervision.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.