NVIDIA Blackwell leads agentic Artificial Intelligence benchmark

Artificial Analysis introduced AgentPerf to compare infrastructure for agentic Artificial Intelligence workloads. NVIDIA Blackwell Ultra NVL72 led the first published results across the tested workloads.

Artificial Analysis has introduced AgentPerf as an agentic Artificial Intelligence benchmark for developers, enterprises and infrastructure providers comparing systems for agentic workloads. In the first round of published results, the NVIDIA Blackwell Ultra NVL72 platform delivers leading performance across the agentic Artificial Intelligence workloads tested, running 20x more agents per megawatt than NVIDIA Hopper.

Agentic Artificial Intelligence differs from conversational Artificial Intelligence because an agent breaks a goal into many steps and continues until the task is complete. That produces dozens to hundreds of LLM calls chained together, with growing context passed between calls and tool calls such as code compile and execution, database search and web browsing at each handoff. Existing inference benchmarks focus on a single LLM call and were not designed to capture the delays, chained calls and context growth that stress accelerated computing systems in agentic workloads.

In this first round, AgentPerf measures agentic performance with DeepSeek V4 Pro, a large mixture-of-experts model that represents the class of frontier models powering today’s most capable agents. On this workload, NVIDIA GB300 NVL72 delivers the highest performance in the benchmark, running up to 20x more agents per megawatt than the NVIDIA HGX H200 system. NVIDIA GB300 NVL72 supports far more concurrent agents per megawatt than NVIDIA H200 at both service-level objectives of 20 and 60 tokens per second per agent.

The performance advantage is attributed to full-stack codesign. GB300 NVL72 connects 72 GPUs into a single rack-scale system, enabling large MoE models like DeepSeek V4 Pro to distribute model execution efficiently at scale. CUDA kernels overlap communication and compute, while NVIDIA TensorRT LLM maintains efficiency as concurrent agent sessions scale by separating input processing from output generation so each can be optimized independently.

AgentPerf is built based on real coding agent trajectories: an agent receives a task, reads files, writes and edits code, executes commands and iterates based on the results, all drawn from real public code repositories across 12+ programming languages. Tool calls are simulated with representative CPU processing time rather than executed, so results reflect accelerated computing performance. Baseten, DeepInfra and Together Artificial Intelligence are already serving agentic workloads on frontier models such as DeepSeek V4 Pro on NVIDIA Blackwell, with deployments including Cursor and Pam.ai.

68

Impact Score

Great American Artificial Intelligence Act targets frontier model developers

The Great American Artificial Intelligence Act would create new obligations mainly for frontier model developers, while leaving many deployment risks for everyday business users intact. Companies using commercial tools would still face state-law, fraud, workforce, privacy, and governance exposure under existing frameworks.

EU rejects Apple blame for Siri Artificial Intelligence delay

European Union officials rejected Apple’s claim that Digital Markets Act rules are blocking the regional launch of Siri Artificial Intelligence. Brussels said Apple must build interoperability solutions that meet European privacy and security standards.

Europe advances cloud and Artificial Intelligence sovereignty

The European Commission’s technology sovereignty package aims to reduce reliance on foreign digital infrastructure. Its Cloud and Artificial Intelligence Development Act is set to become a key focus in negotiations among European lawmakers and member states.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.