The Nvidia Blackwell platform has seen wide adoption among inference providers such as Baseten, DeepInfra, Fireworks Artificial Intelligence and Together Artificial Intelligence, where it is used to reduce cost per token by up to 10x. Building on this deployment, the Nvidia Blackwell Ultra platform is aimed at accelerating agentic Artificial Intelligence, particularly for coding assistants and autonomous agents that must manage complex, multistep tasks. These workloads span entire codebases and require both very low latency and the ability to maintain long context to keep interactions responsive and coherent.
According to OpenRouter’s State of Inference report, Artificial Intelligence agents and coding assistants are driving rapid growth in software-programming-related Artificial Intelligence queries, which increased from 11% to about 50% last year. This shift underscores how much inference demand is shifting toward interactive development tools and automated software agents. These applications place significant pressure on infrastructure to deliver real-time responsiveness while scaling to large numbers of concurrent requests and extended conversations.
New SemiAnalysis InferenceX performance data shows that the combination of Nvidia’s software optimizations and the next-generation Nvidia Blackwell Ultra platform has delivered advances in both performance and efficiency. Nvidia GB300 NVL72 systems now deliver up to 50x higher throughput per megawatt, resulting in 35x lower cost per token compared with the Nvidia Hopper platform. By coordinating innovation across chips, system architecture and software, Nvidia is using an extreme codesign approach to accelerate performance across Artificial Intelligence workloads ranging from agentic coding tools to interactive coding assistants, while continuing to drive down inference costs at scale.
