How NVIDIA H100 GPUs on CoreWeave’s artificial intelligence cloud platform delivered a record-breaking Graph500 run

NVIDIA used 8,192 H100 GPUs on a CoreWeave cluster to set a new Graph500 record of 410 trillion traversed edges per second (TEPS), processing a graph with 2.2 trillion vertices and 35 trillion edges.

NVIDIA announced a record-breaking benchmark result of 410 trillion traversed edges per second (TEPS), ranking No. 1 on the 31st Graph500 breadth-first search (BFS) list. The winning run was performed on an accelerated computing cluster hosted in a CoreWeave data center in Dallas and used 8,192 NVIDIA H100 GPUs to process a graph with 2.2 trillion vertices and 35 trillion edges. According to NVIDIA, the result is more than double the performance of comparable solutions on the list, including those hosted in national labs.

The company framed the performance with a real-world analogy: if every person on Earth has 150 friends, that would be 1.2 trillion edges in a social graph, and the system could search every friend relationship in just about three milliseconds. Beyond raw speed, the run emphasized efficiency: a comparable top 10 entry used about 9,000 nodes, while NVIDIA’s submission used just over 1,000 nodes and delivered 3x better performance per dollar. NVIDIA credits the outcome to combining CUDA, Spectrum-X networking, H100 GPUs and a new active messaging library under a full-stack approach.

The technical advance centers on reengineering graph processing for GPUs. While GPUs have accelerated dense workloads like Artificial Intelligence training, large-scale sparse and irregular graph workloads traditionally ran on CPUs. CPUs move graph data across nodes and encounter communication bottlenecks at trillions of edges. NVIDIA implemented a GPU-only solution that uses InfiniBand GPUDirect Async (IBGDA) and the NVSHMEM parallel programming interface to enable GPU-to-GPU active messages. With IBGDA the GPU can communicate directly with the InfiniBand network interface card, and message aggregation was built to support hundreds of thousands of GPU threads sending active messages simultaneously instead of the hundreds typical on CPUs.

Running on CoreWeave infrastructure, the GPU-native active messaging approach bypasses the CPU, leverages H100 parallelism and memory bandwidth, and reduces hardware footprint and cost. NVIDIA says the result validates a path for bringing supercomputing performance to commercially available infrastructure and suggests that other high-performance computing fields with sparse communication patterns, such as fluid dynamics and weather forecasting, can benefit from NVSHMEM and IBGDA to scale their largest applications.

68

Impact Score

How global R&D spending growth has shifted since 2000

Global research and development spending has nearly tripled since 2000, with China and a group of emerging economies driving the fastest growth. Slower but still substantial expansion in mature economies highlights a world that is becoming more research intensive overall.

Finance artificial intelligence compliance in European financial services

The article explains how financial firms can use artificial intelligence tools while meeting European, United Kingdom, Irish and United States regulatory expectations, focusing on risk, transparency and governance. It details the European Union artificial intelligence act, the role of cybersecurity, and the standards and practices that support compliant deployment across the financial sector.

Artificial intelligence becomes a lever for transformation in Africa

African researchers and institutions are positioning artificial intelligence as a tool to tackle structural challenges in health, education, agriculture and governance, while pushing for data sovereignty and local language inclusion. The continent faces hurdles around skills, infrastructure and control of data but is exploring frugal technological models tailored to its realities.

Microsoft unveils Maia 200 artificial intelligence inference accelerator

Microsoft has introduced Maia 200, a custom artificial intelligence inference accelerator built on a 3 nm process and designed to improve the economics of token generation for large models, including GPT-5.2. The chip targets higher performance per dollar for services like Microsoft Foundry and Microsoft 365 Copilot while supporting synthetic data pipelines for next generation models.

Samsung’s 2 nm node progress could revive foundry business and attract Qualcomm

Samsung Foundry’s 2 nm SF2 process is reportedly stabilizing at around 50% yields, positioning the Exynos 2600 as a key proof of concept and potentially helping the chip division return to profit. New demand from Tesla Artificial Intelligence chips and possible deals with Qualcomm and AMD are seen as central to the turnaround.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.