Nvidia and Coreweave set Graph500 record with H100 gpu cluster

Nvidia and Coreweave achieved a record-breaking 410 trillion traversed edges per second on the 31st Graph500 breadth-first search benchmark using a commercially available cluster of 8,192 H100 gpus hosted in Dallas. The result showcases a gpu-only, full-stack approach to large-scale graph processing that doubles the performance of comparable systems while using far fewer nodes and lower cost.

Nvidia has claimed the top spot on the 31st Graph500 breadth-first search list with a benchmark result of 410 trillion traversed edges per second (TEPS), delivered on a commercially available cluster hosted by cloud provider Coreweave. The record-setting run took place in a Coreweave data center in Dallas and used 8,192 Nvidia H100 gpus to process a graph containing 2.2 trillion vertices and 35 trillion edges. According to Nvidia, this performance is more than double that of comparable Graph500 entries, including systems operated by national laboratories, highlighting the potential of its accelerated computing stack for large-scale graph workloads.

The company emphasizes that efficiency is as important as raw speed. While a comparable top 10 Graph500 system used about 9,000 nodes, the Nvidia and Coreweave configuration reached its result with just over 1,000 nodes, which the company says delivers 3x better performance per dollar. Nvidia illustrates the scale by noting that if every person on Earth had 150 friends, this would correspond to 1.2 trillion edges in a social graph, and the demonstrated system could search all such relationships in about three milliseconds. The achievement relies on Nvidia’s integrated platform, spanning Nvidia CUDA software, Spectrum-X networking, H100 gpus and a new active messaging library designed to minimize hardware footprint while maximizing throughput.

Graph500 breadth-first search is presented as a long-standing industry benchmark for navigating sparse, irregular graphs, such as those representing social networks, banking relationships or cybersecurity data. Traditional approaches to very large graph processing have relied on cpu-based systems, where moving graph data between nodes can create communication bottlenecks at trillion-edge scales. To overcome this, developers have used active messages that process data in place, but these techniques were originally designed for cpus and are constrained by cpu throughput. Nvidia reengineered this model around gpus using a custom framework built on InfiniBand GPUDirect Async (IBGDA) and the NVSHMEM parallel programming interface, enabling gpu-to-gpu active messages and allowing hundreds of thousands of gpu threads to send messages concurrently. By running active messaging entirely on gpus and leveraging the parallelism and memory bandwidth of H100 devices on Coreweave’s infrastructure, the system doubled the performance of similar runs while using a fraction of the hardware and cost. Nvidia argues that this approach opens a new path for high-performance computing fields such as fluid dynamics and weather forecasting, which rely on sparse data structures, enabling developers to scale their largest applications on commercially available infrastructure using technologies like NVSHMEM and IBGDA.

63

Impact Score

Indiana launches Artificial Intelligence business portal

Indiana is rolling out IN AI, a statewide portal meant to help employers adopt Artificial Intelligence with practical guidance, workshops and peer support. State leaders and business groups are positioning the effort as a way to raise productivity, wages and job growth while keeping workers at the center.

Goodfire launches model debugging tool for large language models

Goodfire has introduced Silico, a mechanistic interpretability platform designed to let developers inspect and adjust model behavior during development. The company is positioning it as a way to give smaller teams deeper control over open-source models and more trustworthy outputs.

Nvidia launches nemotron 3 nano omni for enterprise agents

Nvidia has introduced Nemotron 3 Nano Omni, a multimodal open model designed to support enterprise agents that reason across vision, speech and language. The launch extends Nvidia’s push beyond hardware into models and services while targeting more efficient agentic workflows.

Intel 18A-P node improves performance and efficiency

Intel plans to present new results for its 18A-P process at the VLSI 2026 Symposium, highlighting gains in performance, power efficiency, and manufacturing predictability. The updated node is positioned as a stronger option for customers seeking 18A density with better operating characteristics.

EA CEO defends broader Artificial Intelligence use in game development

EA CEO Andrew Wilson defended the company’s internal use of Artificial Intelligence after employee claims that the tools were slowing work rather than helping. He framed the technology as an aid for repetitive quality assurance tasks, even as concerns persist over its broader impact on development.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.