NVIDIA groq 3 LPX targets low-latency Artificial Intelligence inference

NVIDIA positions Groq 3 LPX as an inference accelerator for Vera Rubin built to handle low-latency, large-context workloads for agentic systems. The platform combines Rubin GPUs and LPUs in a co-designed architecture aimed at boosting throughput, token generation, and efficiency at rack scale.

NVIDIA Groq 3 LPX is positioned as the inference accelerator for NVIDIA Vera Rubin, aimed at the low-latency and large-context requirements of agentic systems. The design pairs NVIDIA Rubin GPUs with LPUs in a co-designed architecture intended to combine interactivity, intelligence, and throughput in a single inference platform. NVIDIA says the combination extends the Artificial Intelligence factory with deterministic, low-latency token generation for real-time inference workloads.

By combining Rubin GPUs for high-bandwidth memory (HBM) and LPUs for static random-access memory (SRAM), NVIDIA Vera Rubin with LPX delivers a new class of inference performance for trillion-parameter models and million-token context. Deployed with Vera Rubin NVL72, Rubin GPUs and LPUs boost decode by jointly computing every layer of the Artificial Intelligence model for every output token. Agentic systems consume up to 15x more tokens than traditional Artificial Intelligence applications. When paired with LPX, Vera Rubin delivers up to 35x higher throughput per megawatt for trillion-parameter models. When LPX is paired with Vera Rubin, Artificial Intelligence factories can produce premium tokens at scale, unlocking 10x more revenue per watt.

NVIDIA says each LPX rack features 256 interconnected LPU accelerators that work with the Vera Rubin platform to accelerate inference. Each LPU accelerator delivers 500 megabytes (MB) of SRAM, 150 terabytes per second (TB/s) of SRAM bandwidth, and 2.5 TB/s scale-up bandwidth. At the rack level, LPX delivers 128 GB of SRAM for low-latency processing and 12 TB of DDR5 memory for large models and workloads. NVIDIA also highlights 40 petabytes per second (PB/s) of SRAM bandwidth per rack and 640 TB/s of scale-up bandwidth across the LPX rack for low-latency chip communication.

The broader system is presented as part of the NVIDIA Vera Rubin NVL72 platform, which unifies seven purpose-built chips into a single Artificial Intelligence supercomputer. NVIDIA says LPX connects to NVL72 with high-speed links designed to reduce latency to near zero. The platform also uses the NVIDIA MGX ETL rack so deployments can plan around a single universal rack within Vera Rubin installations. Together, the message centers on scaling long-context inference and enabling token factories to support more demanding agentic workloads with higher performance and efficiency.

68

Impact Score

BitUnlocker bypasses TPM-only Windows 11 BitLocker

Intrinsec disclosed BitUnlocker, a downgrade attack that can bypass TPM-only Windows 11 BitLocker protections with physical access to a machine. The technique abuses a flaw in Windows recovery and deployment components and relies on older trusted boot code.

Micron samples 256 GB DDR5 9200 MT/s RDIMM server modules

Micron has begun sampling 256 GB DDR5 RDIMM server modules built on its 1-gamma technology to key ecosystem partners. The company positions the new modules as a higher-speed, more power-efficient option for scaling next-generation Artificial Intelligence and HPC infrastructure.

Microsoft emails show early doubts about OpenAI

Court emails show Microsoft executives were unconvinced by OpenAI’s early Artificial Intelligence progress in 2018 while also worrying that rejecting the lab could push it toward Amazon. The messages reveal internal tension between skepticism over technical claims and concern about competitive and public relations fallout.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.