Nvidia has achieved the world’s top-performing result for large scale graph processing using a commercially available cluster hosted on the Coreweave cloud platform. The company reported a benchmark result of 410 trillion traversed edges per second (TEPS), which ranked No. 1 on the 31st Graph500 breadth-first search (BFS) list. The record setting run was executed on an accelerated computing cluster located in a Coreweave data center in Dallas.
The winning configuration used 8,192 Nvidia H100 GPUs to process a graph containing 2.2 trillion vertices and 35 trillion edges. This result is described as more than double the performance of comparable solutions listed on Graph500, including entries operated by national laboratories. The system demonstrates that a commercially available cloud cluster can outperform highly specialized installations in large scale graph analytics workloads.
To illustrate the scale, the article notes that if every person on Earth had 150 friends, this would correspond to 1.2 trillion edges in a social relationship graph. According to Nvidia and Coreweave, their system enables searching through every such friend relationship on Earth in just about three milliseconds. The article emphasizes that speed at this scale is only part of the achievement, with efficiency also highlighted, as a comparable top 10 Graph500 entry used about 9,000 nodes, while the Nvidia run used just over 1,000 nodes, delivering 3x better performance per dollar.
