Installation for vLLM

August 29, 2025

vLLM supports multiple hardware platforms including GPUs, CPUs, Google TPU and AWS Neuron. The project also uses out-of-repository hardware plugins for additional accelerators.

vLLM supports a broad set of hardware platforms for inference and serving. The documentation lists GPU support with specific backends for NVIDIA CUDA, AMD ROCm and Intel XPU. CPU targets include Intel/AMD x86, ARM AArch64, Apple silicon and IBM Z (S390X). In addition to GPU and CPU support, the project notes compatibility with Google TPU and AWS Neuron.

The documentation also describes a hardware plugin model. It states that backends live outside the main vLLM repository and follow the Hardware-Pluggable RFC. A table in the installation page enumerates available accelerator plugins and their packaging or install status. Ascend NPU is published as the vllm-ascend package with a linked GitHub repository. Intel Gaudi (HPU) and MetaX MACA GPU are marked as not available on PyPI and must be installed from source, with repositories provided. Rebellions ATOM / REBEL NPU is listed with a vllm-rbln package and a repository link. IBM Spyre Artificial IntelligenceU appears in the table with a vllm-spyre package and a repository link.

The page groups hardware information into GPU and CPU sections and provides dedicated subpages for each platform, as well as separate pages for Google TPU and AWS Neuron. Where plugins are not published on PyPI, the documentation indicates installation from source. The installation document centralizes supported accelerators and points users to external GitHub repositories for third-party backends, while clarifying which plugins are packaged and which require source installation.

72

Impact Score

Latest News

Y Combinator backs 241 generative artificial intelligence startups across sectors in 2026

February 28, 2026

Y Combinator is backing 241 generative artificial intelligence startups in 2026, spanning infrastructure, developer tools, biotech, creative media, and highly specialized industry agents. The cohort highlights a shift toward domain-specific automation, autonomous agents, and new consumer experiences built on generative models.

Perplexity launches Computer to orchestrate many Artificial Intelligence models

February 28, 2026

Perplexity is rolling out Computer, a cloud-based agent that coordinates 19 Artificial Intelligence models for complex workflows, as it pivots toward high-value enterprise users and deep research. The launch underscores a broader bet on multi-model orchestration, custom benchmarks and a boutique business strategy over mass adoption.

Llama foundation models emphasize open, efficient training

February 28, 2026

Meta introduces the Llama family of foundation language models, trained only on publicly available data while matching or surpassing much larger proprietary systems on standard benchmarks.

ASML’s high-NA EUV tools reach production-ready status for next-generation Artificial Intelligence chips

February 28, 2026

ASML has declared its high numerical aperture extreme ultraviolet lithography tools ready for mass production, setting up the next performance jump in Artificial Intelligence accelerators and large language model hardware.

AMD and Nutanix align on open enterprise artificial intelligence infrastructure

February 28, 2026

AMD and Nutanix have formed a multi-year partnership to build an open, full-stack enterprise artificial intelligence infrastructure platform focused on agentic workloads across data centers, hybrid cloud, and edge environments.