Nvidia Blackwell Ultra GB300 NVL72 targets massive gains in agentic artificial intelligence inference

Nvidia’s GB300 NVL72 system, built on the Blackwell Ultra GPU and a codesigned software stack, sharply improves performance and cost for agentic artificial intelligence workloads, from low-latency assistants to long-context coding tools. New data highlights throughput-per-megawatt and token-cost advantages over both Hopper and prior Blackwell platforms.

The Nvidia Blackwell platform is seeing broad adoption among inference providers such as Baseten, DeepInfra, Fireworks Artificial Intelligence and Together Artificial Intelligence, with deployments already reducing cost per token by up to 10x compared with earlier generations. Agentic Artificial Intelligence use cases and coding assistants are driving rapid growth in software-programming-related Artificial Intelligence queries, which increased from 11% to about 50% last year according to OpenRouter’s State of Inference report, and these workloads demand both low latency across multistep workflows and long context to reason over entire codebases. New SemiAnalysis InferenceX performance data indicates that Nvidia’s combination of software optimizations and the next-generation Blackwell Ultra platform pushes Nvidia GB300 NVL72 systems to deliver up to 50x higher throughput per megawatt, resulting in 35x lower cost per token compared with the Nvidia Hopper platform.

Earlier analysis from Signal65 found that Nvidia GB200 NVL72 with tightly codesigned hardware and software delivers more than 10x more tokens per watt, which results in one-tenth the cost per token compared with the Nvidia Hopper platform, and these gains have been expanding as the stack improves. Continuous optimizations from teams behind Nvidia TensorRT-LLM, Nvidia Dynamo, Mooncake and SGLang are significantly boosting Blackwell NVL72 throughput for mixture-of-experts inference at all latency targets, and Nvidia TensorRT-LLM library changes alone have delivered up to 5x better performance on GB200 for low-latency workloads compared with just four months ago. Building on these advances, GB300 NVL72 with the Blackwell Ultra GPU extends throughput-per-megawatt to 50x compared with Hopper, and this translates into up to 35x lower cost per million tokens at low latency where agentic applications operate, enabling real-time interactive assistants to scale to many more users.

The benefits of GB300 NVL72 are particularly pronounced in long-context scenarios, such as Artificial Intelligence coding assistants that must reason across entire repositories. For workloads with 128,000-token inputs and 8,000-token outputs, GB300 NVL72 delivers up to 1.5x lower cost per token compared with GB200 NVL72, helped by Blackwell Ultra’s 1.5x higher NVFP4 compute performance and 2x faster attention processing that allow efficient understanding of entire code bases. Major cloud providers including Microsoft, CoreWeave and Oracle Cloud Infrastructure are deploying GB300 NVL72 for low-latency and long-context use cases, with CoreWeave emphasizing that Grace Blackwell NVL72 improves token economics and makes large-scale inference more usable for customers. Looking ahead, the Nvidia Rubin platform, which combines six new chips into a single Artificial Intelligence supercomputer, is positioned to deliver further improvements, including up to 10x higher throughput per megawatt for mixture-of-experts inference compared with Blackwell that translate into one-tenth the cost per million tokens, and the ability to train large mixture-of-experts models using just one-fourth the number of GPUs compared with Blackwell.

68

Impact Score

European Union delays key Artificial Intelligence Act obligations

European Union lawmakers have agreed to revise the Artificial Intelligence Act, delaying major high-risk compliance obligations and easing some overlapping requirements. The changes give businesses more time to prepare while preserving the law’s core framework for high-risk systems and transparency rules.

HMRC signs £175m Quantexa deal for fraud detection

HM Revenue and Customs has signed a £175 million, 10-year agreement with Quantexa to unify fragmented data and strengthen fraud detection. The deployment is designed to automate routine work while keeping decisions transparent, auditable and subject to human approval.

Us supercomputers test new Artificial Intelligence chip suppliers

Sandia National Laboratories is evaluating chips from Israeli startup NextSilicon as major chipmakers shift their roadmaps toward Artificial Intelligence. The move reflects growing concern that mainstream processors are deprioritizing the scientific computing features government labs still need.

EU Artificial Intelligence Act amendments delay some deadlines and add new bans

A provisional Digital Omnibus on Artificial Intelligence would push back several EU Artificial Intelligence Act deadlines, refine how the law interacts with sector rules, and introduce new prohibited practices. The package also expands limited bias-testing allowances and strengthens centralized oversight for some high-impact systems.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.