Cloud providers adopt NVIDIA Dynamo to boost Artificial Intelligence inference performance

Amazon Web Services, Google Cloud, Microsoft Azure and Oracle Cloud Infrastructure are integrating NVIDIA Dynamo to enable multi-node Artificial Intelligence inference on Blackwell systems and managed Kubernetes services.

NVIDIA Blackwell is highlighted in the article as delivering industry-leading performance, efficiency and the lowest total cost of ownership across tested models, with a cited SemiAnalysis InferenceMAX v1 benchmark and a figure noting Jensen Huang’s claim that Blackwell delivers 10x the performance of NVIDIA Hopper. To realize that performance for complex Artificial Intelligence models, the article explains the need to distribute inference across multiple servers to serve millions of concurrent users and reduce response times.

The NVIDIA Dynamo software platform is presented as the production-ready solution that unlocks multi-node, or disaggregated, inference across existing cloud environments. The article describes disaggregated serving as a method that separates the two phases of model serving, prefill and decode, onto independently optimized GPUs to avoid bottlenecks and improve efficiency. For large reasoning and mixture-of-experts models such as DeepSeek-R1, disaggregated serving is said to be essential. A real-world example notes that Baseten used NVIDIA Dynamo to achieve 2x faster inference for long-context code generation and a 1.6x increase in throughput without adding hardware.

Kubernetes is identified as the scaling mechanism for disaggregated serving, and the article lists major cloud integrations that make Dynamo available in managed Kubernetes offerings. Amazon Web Services is noted to integrate Dynamo with Amazon EKS. Google Cloud provides a Dynamo recipe for LLM inference on its AI Hypercomputer. Microsoft Azure enables multi-node LLM inference with Dynamo and ND GB200-v6 GPUs on Azure Kubernetes Service. Oracle Cloud Infrastructure supports multi-node LLM inferencing with OCI Superclusters and Dynamo. The piece also mentions Nebius as a partner building cloud infrastructure to serve inference workloads with NVIDIA accelerated computing and Dynamo.

To simplify orchestration on Kubernetes, NVIDIA Grove is introduced as an application programming interface within Dynamo that lets users declare high-level specifications for complex inference systems. Grove is described as automating coordination, scaling, placement and startup order for components such as prefill, decode and routing, reducing operational complexity as inference becomes increasingly distributed. The article directs readers to technical deep dives and simulation tools for further exploration of hardware and deployment trade-offs.

68

Impact Score

NVIDIA and Doosan broaden physical Artificial Intelligence partnership

NVIDIA and Doosan Group are expanding work across robotics, autonomous equipment, power infrastructure and advanced materials. The partnership links NVIDIA accelerated computing platforms with Doosan businesses serving industrial automation, energy systems and data center hardware.

Chatbot liability suits test Artificial Intelligence safety law

A Florida lawsuit targeting ChatGPT’s maker signals a new product liability threat for Artificial Intelligence companies. The fight could turn on unsettled questions about platform immunity, speech protections, causation, and federal safety rules.

Canada pushes Artificial Intelligence sovereignty strategy

Canada has unveiled an Artificial Intelligence for All strategy focused on reducing reliance on foreign cloud and Artificial Intelligence providers. The plan mirrors the EU’s new sovereignty push and sets targets for adoption, infrastructure and jobs.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.