The article catalogs Microsoft Foundry Models sold directly by Azure, combining all Azure OpenAI models with selected offerings from external providers, and explains how availability differs depending on whether a project uses a Foundry resource or a hub-based project with managed compute or serverless APIs. It introduces Foundry Models as deployable for standard Foundry resources and notes that model options can overlap across deployment types. The document is explicitly focused on currently supported, non-deprecated models and points readers to separate collections for other model groups.
The Azure OpenAI section highlights the newest model series, including GPT-5.2 and GPT-5.1, with reasoning, chat completions, Responses API support, structured outputs, text and image processing, and tools and functions. The article states that gpt-5.2 (2025-12-11) and gpt-5.2-chat (2025-12-11) have a context window of 400,000Input: 272,000Output: 128,000 and 128,000 Input: 111,616 Output: 16,384 respectively, and notes that registration is required to access gpt-5.2. It similarly describes gpt-5.1 (2025-11-13) and related codex variants with 400,000Input: 272,000Output: 128,000 context windows, 128,000 max output tokens, and training data up to September 30, 2024, while warning that gpt-5.1 reasoning_effort defaults to none and that gpt-5.1-chat does not support parameters like temperature. Earlier GPT-5 series models, GPT-4.1, GPT-4o, GPT-4, GPT-3.5, embeddings, image, video, and audio models are described with their token limits, training cutoffs, and constraints such as a known GPT-4.1 issue where tool definitions that exceed 300,000 tokens can trigger context_length_exceeded errors even within the 1,047,576 token limit.
Newer capabilities span multimodal and specialized use cases, including GPT-4o and GPT-4 Turbo for text and vision, gpt-image-1 and DALL-E 3 for image generation with 4,000 character prompt limits, Sora and sora-2 video models with a 4,000 character maximum request, and GPT-4o audio variants like gpt-4o-mini-audio-preview (2024-12-17) with Input: 128,000 Output: 16,384. The audio API section lists speech-to-text models such as whisper and gpt-4o-transcribe with 25 MB max audio size, and text-to-speech models tts, tts-hd, and gpt-4o-mini-tts. Fine-tuning is available for models like gpt-4o-mini (2024-07-18) and gpt-4.1 (2025-04-14) with Input: 128,000 Output: 16,384 and training example context length: 65,536, across a mix of standard and global training options, and the article notes that global training is more affordable but lacks data residency guarantees.
The document devotes extensive tables to deployment types and regional coverage across Global Standard, Global Provisioned managed, Global Batch, Data Zone Standard, Data Zone Provisioned managed, and Data Zone Batch. It lists, for example, that Global Standard in eastus2 supports gpt-5.2, 2025-12-11, gpt-5.2-chat, 2025-12-11, gpt-5.1, 2025-11-13, multiple o-series reasoning models like o3 and o4-mini, image and audio models such as gpt-image-1, 2025-04-15 and gpt-4o-realtime-preview, 2025-06-03, and video models sora, 2025-05-02 and sora-2, 2025-10-06. Global Provisioned managed similarly lists where gpt-5.1, 2025-11-13, gpt-5, 2025-08-07, and reasoning models like o3, 2025-04-16 are provisionable, while Global Batch shows which regions can run long-running jobs for gpt-5, 2025-08-07 and gpt-4.1, 2025-04-14. Standard regional deployments summarize per-endpoint availability for chat completions, embeddings, image, video, audio, and legacy completions, including where GPT-4 Turbo with Vision (turbo-2024-04-09) and gpt-35-turbo (1106 and 0125) can be used.
Beyond Azure OpenAI, the article details Foundry Models from top providers that Azure sells directly. Black Forest Labs’ FLUX.1-Kontext-pro and FLUX-1.1-pro handle text and image prompts up to 5,000 tokens and 1 image, return one image in PNG or JPG, and run as Global standard in all regions via both the Image API and a provider-specific API, with extra parameters like seed, aspect ratio, and safety_tolerance. Cohere models include Cohere-command-a with Input: text (131,072 tokens) Output: text (8,182 tokens), as well as classification and embedding models such as Cohere-rerank-v4.0-pro and embed-v-4-0 that support multi-language text and image inputs on Global standard. DeepSeek’s chat-completion models like DeepSeek-R1-0528 and DeepSeek-V3-0324 offer reasoning content with Input: text (163,840 tokens) Output: (163,840 tokens) or Input: text (131,072 tokens) Output: (131,072 tokens), and are available both as Global standard and Global provisioned in all listed regions.
Meta’s Llama-4-Maverick-17B-128E-Instruct-FP8 and Llama-3.3-70B-Instruct provide chat-completion with up to Input: text and images (1M tokens) Output: text (1M tokens) or Input: text (128,000 tokens) Output: text (8,192 tokens), and support a wide range of languages via Global standard. Microsoft’s own models include model-router, which has a context window of 200,0003 and routes requests across underlying models like GPT-4.1 and o4-mini, and MAI-DS-R1, a reasoning chat model with Input: text (163,840 tokens) Output: (163,840 tokens) offered in all regions as Global standard. Mistral models such as Mistral-Large-3 and mistral-document-ai-2505 support text, image, and document understanding, with mistral-document-ai-2505 handling up to 30 pages and max 30MB PDF files and being available in Global standard for all regions and data zone standard in US and EU.
Moonshot AI’s Kimi-K2-Thinking is described as a reasoning-focused chat-completion model able to maintain stable tool use across 200-300 sequential calls, with Input: text (262,144 tokens) Output: text (262,144 tokens) in English and Chinese and a Global standard footprint. xAI’s Grok family includes grok-4 with Input: text (256,000 tokens) Output: text (8,192 tokens), grok-4-fast-reasoning and grok-4-fast-non-reasoning that accept Input: text, image (2,000,000 tokens) Output: text (2,000,000 tokens), and grok-3 and grok-3-mini that support Input: text (131,072 tokens) Output: text (131,072 tokens); these are available as Global standard in all regions and, for some variants, as Data zone standard in the US. The article clarifies that grok-code-fast-1 requires registration and that grok-4 also requires registration, while emphasizing that Grok 4 Fast can bypass reasoning for ultra-fast applications.
The closing sections unify model region availability by deployment type for all partner collections, showing that all DeepSeek, Grok, FLUX, Llama, MAI-DS-R1, and mistral-document-ai-2505 models are broadly available via Global Standard in every listed Azure region, with additional Global Provisioned and Data Zone Standard coverage for certain DeepSeek and Grok variants. It notes that some models, like o3-deep-research, are only accessible through Foundry Agent Service rather than direct deployments, and that open and custom models in the wider catalog must be deployed via managed compute or user-provided infrastructure instead of standard Foundry resources. The article concludes with links to related guidance on deployment types, managed compute, model retirement, and working with Azure OpenAI models.
