OVHcloud AI Endpoints is a serverless inference platform that exposes generative Artificial Intelligence models via APIs, designed to be powerful, secure, and straightforward to integrate into applications. The service supports a selection of models including Llama, Qwen, Deepseek, and over 40 models, and is built with data privacy as a central principle. The platform is positioned to help organisations enhance their products with conversational agents, voice capabilities, document analysis, and image processing while maintaining strict control over data handling and infrastructure choices.
Data security and confidentiality are presented as core guarantees of OVHcloud AI Endpoints. The platform states that user data will never be used to train or improve its Artificial Intelligence models, and it operates on an infrastructure backed by ISO 27000, SOC, and healthcare data certifications. The service emphasises privacy by specifying that customer data is neither reused nor kept, and is designed to support data sovereignty requirements. Reversibility is also highlighted, with the option for models to be deployed on the user’s own infrastructure or integrated with other cloud services to avoid vendor lock-in and preserve operational control.
The offer is tailored to developers by providing comprehensive documentation, code examples, and standard APIs compatible with popular formats such as OpenAI, along with token-based authentication for simple access and revocation management. AI Endpoints incorporates lifecycle management for transparent model versioning, a sandbox environment for interactive testing, and performance benefits derived from OVHcloud GPU infrastructure. Typical usage scenarios include conversational Artificial Intelligence for customer engagement, Artificial Intelligence powered voice transcription and text-to-speech for accessibility and meetings, and private code assistants integrated into IDEs to suggest code, detect errors, and automate tasks while keeping source code confidential.
OVHcloud is building a broad ecosystem of integrations around AI Endpoints, connecting with tools and frameworks widely used in the developer community. Native integrations include Hugging Face for the machine learning community, LiteLLM for a unified interface to call over 100 LLMs via a standardised format, and Pydantic AI plus Pydantic AI Gateway for typed, structured agentic workflows and model access with budget control. Additional integrations span coding and agent frameworks such as Continue, Kilo Code and Kilo Code CLI, Shell AI by OVHcloud, OpenCode, Mastra, and LlamaIndex, as well as Apache Airflow for workflow orchestration. These integrations are intended to let developers work from their preferred environments while benefiting from OVHcloud’s infrastructure performance and security posture.
The service is framed as suitable for businesses of all sizes, from startups to large enterprises, by combining simplicity of integration with enterprise-grade security and scalability. OVHcloud states that AI Endpoints can scale to support hundreds to millions of requests without compromising performance or privacy. The platform is part of a wider Artificial Intelligence and data portfolio that includes AI Deploy for model and application deployment, Cloud GPU for accelerated computing instances described as up to 1,000 times faster than a CPU for parallel processing, and Data Platform for centralised analytics projects. New users are encouraged to explore AI Endpoints via tutorials, documentation, and a Public Cloud free trial that offers US$ 200 in free credit to launch a first Public Cloud project.