IBM, Red Hat, and Google donate llm-d to CNCF

IBM Research, Red Hat, and Google Cloud have donated llm-d, an open-source Kubernetes framework for large language model inference, to the CNCF as a sandbox project. The move aims to create a vendor-neutral blueprint for deploying scalable inference across models, accelerators, and clouds.

IBM Research, Red Hat, and Google Cloud announced at KubeCon Europe 2026 in Amsterdam that they are donating llm-d to the Cloud Native Computing Foundation as a sandbox project. The open-source framework is designed to make large language model inference a cloud-native, production-grade workload on Kubernetes. Backing from NVIDIA, CoreWeave, AMD, Cisco, Hugging Face, Intel, Lambda, and Mistral Artificial Intelligence positions the project as a community-governed effort around vendor-neutral inference infrastructure.

Launched in 2025, llm-d was built to make serving foundation models at scale predictable, portable, and cloud-native. It turns inference from a model-by-model deployment problem into a reusable Kubernetes-based system. The framework splits inference into prefill and decode phases and runs them on different pods, allowing each phase to scale independently. It also adds routing and scheduling based on KV-cache state, pod load, and hardware characteristics, and layers a modular stack on Kubernetes using vLLM as an inference gateway.

The design is intended to improve both performance and cost efficiency for stateful inference workloads. Early testing by Google Cloud showed “2x improvements in time-to-first-token for use cases like code completion, enabling more responsive applications.” llm-d also supports hierarchical cache offloading across GPU, CPU, and storage tiers, which helps support larger context windows without overwhelming accelerator memory. Its autoscaling is tuned to workload patterns and hardware rather than generic utilization metrics, and it is designed to work with Kubernetes technologies including the Gateway API Inference Extension and LeaderWorkerSet.

Supporters describe llm-d as a validated path from experimentation to production, with reproducible benchmarks, tested deployment patterns, and compatibility across Nvidia GPUs, Google TPUs, and AMD and Intel hardware. IBM executives framed the donation as part of a broader push to make distributed inference a standard part of the cloud-native stack, comparable in importance to established CNCF projects. The next development cycle will focus on multi-modal workloads, HuggingFace multi-LoRA optimization, and deeper integration with vLLM, with Mistral Artificial Intelligence already contributing code for disaggregated serving.

66

Impact Score

AMD plans specialized EPYC CPUs for Artificial Intelligence, hpc, and cloud

AMD is preparing a broader EPYC strategy with task-specific server CPUs aimed at agentic Artificial Intelligence, hpc, training and inference, and cloud deployments. The shift starts with the Zen 6 generation and adds Verano as an Artificial Intelligence-focused variant within the same EPYC family.

Nvidia expands spectrum-x ethernet with open mrc protocol

Nvidia is positioning Spectrum-X Ethernet as a foundation for large-scale Artificial Intelligence training, with Multipath Reliable Connection adding open, multi-path RDMA transport for higher resilience and throughput. OpenAI, Microsoft and Oracle are among the organizations using the technology in large Artificial Intelligence environments.

Anthropic explores Fractile chips to diversify supply

Anthropic is reportedly in early talks with London-based Fractile to secure high-performance Artificial Intelligence chips for inference workloads. The move would reduce reliance on Nvidia and broaden the company’s hardware supply chain.

OpenAI curbs odd creature references in chatbot responses

OpenAI has adjusted its models after users complained about overly familiar responses and strange references to goblins, gremlins, pigeons, and raccoons. The company traced the behavior to a retired “nerdy” personality whose habits spread into broader model training.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.