vLLM server brings OpenAI compatible APIs to local and cloud models
vLLM exposes an OpenAI compatible HTTP server for text, chat, embeddings, audio, and multimodal workloads, while adding its own extensions for pooling, scoring, and re-ranking. It is designed to let existing OpenAI clients talk to local or self-hosted models with minimal code changes.