NVIDIA groq 3 LPX targets low-latency Artificial Intelligence inference

NVIDIA positions Groq 3 LPX as an inference accelerator for Vera Rubin built to handle low-latency, large-context workloads for agentic systems. The platform combines Rubin GPUs and LPUs in a co-designed architecture aimed at boosting throughput, token generation, and efficiency at rack scale.

NVIDIA Groq 3 LPX is positioned as the inference accelerator for NVIDIA Vera Rubin, aimed at the low-latency and large-context requirements of agentic systems. The design pairs NVIDIA Rubin GPUs with LPUs in a co-designed architecture intended to combine interactivity, intelligence, and throughput in a single inference platform. NVIDIA says the combination extends the Artificial Intelligence factory with deterministic, low-latency token generation for real-time inference workloads.

By combining Rubin GPUs for high-bandwidth memory (HBM) and LPUs for static random-access memory (SRAM), NVIDIA Vera Rubin with LPX delivers a new class of inference performance for trillion-parameter models and million-token context. Deployed with Vera Rubin NVL72, Rubin GPUs and LPUs boost decode by jointly computing every layer of the Artificial Intelligence model for every output token. Agentic systems consume up to 15x more tokens than traditional Artificial Intelligence applications. When paired with LPX, Vera Rubin delivers up to 35x higher throughput per megawatt for trillion-parameter models. When LPX is paired with Vera Rubin, Artificial Intelligence factories can produce premium tokens at scale, unlocking 10x more revenue per watt.

NVIDIA says each LPX rack features 256 interconnected LPU accelerators that work with the Vera Rubin platform to accelerate inference. Each LPU accelerator delivers 500 megabytes (MB) of SRAM, 150 terabytes per second (TB/s) of SRAM bandwidth, and 2.5 TB/s scale-up bandwidth. At the rack level, LPX delivers 128 GB of SRAM for low-latency processing and 12 TB of DDR5 memory for large models and workloads. NVIDIA also highlights 40 petabytes per second (PB/s) of SRAM bandwidth per rack and 640 TB/s of scale-up bandwidth across the LPX rack for low-latency chip communication.

The broader system is presented as part of the NVIDIA Vera Rubin NVL72 platform, which unifies seven purpose-built chips into a single Artificial Intelligence supercomputer. NVIDIA says LPX connects to NVL72 with high-speed links designed to reduce latency to near zero. The platform also uses the NVIDIA MGX ETL rack so deployments can plan around a single universal rack within Vera Rubin installations. Together, the message centers on scaling long-context inference and enabling token factories to support more demanding agentic workloads with higher performance and efficiency.

68

Impact Score

Nvidia sets the stage for GTC 2026 keynote

Nvidia is preparing to outline its next wave of computing, networking, and rendering plans at GTC 2026, with Jensen Huang leading the keynote. The event is expected to focus on next-generation platforms, broader Artificial Intelligence infrastructure, and the company’s expanding partnership with Intel.

Nvidia chief projects chip sales growth

Nvidia’s chief executive is tied to a projection of massive future Artificial Intelligence chip revenue, but the available source material provides no reported details beyond the headline and a brief author description.

Can world models unlock general purpose robotics

World models aim to help robots learn physics from large-scale video instead of relying mainly on hand-built simulators and scarce robot-specific data. Early results are promising, but major questions remain around consistency, tactile sensing, speed, and economics.

HHS weighs clinical Artificial Intelligence adoption around trust and burden

HHS is using public feedback to shape how Artificial Intelligence should be adopted in clinical care, with a focus on provider burden, patient trust, interoperability, and responsible use. The department is signaling that future changes in regulation, reimbursement, and research will reflect the themes that emerge.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.