NVIDIA’s developer catalog brings together community and open models accelerated for its AI inference platform and NVIDIA infrastructure. The lineup spans DeepSeek, Google DeepMind’s Gemma, OpenAI gpt-oss, Moonshot AI’s Kimi, Meta’s Llama, NVIDIA Nemotron, Microsoft Phi, and Alibaba’s Tongyi Qwen3, with deployment paths through NVIDIA NIM, TensorRT-LLM, NeMo, Ollama, Hugging Face, SGLang, vLLM, Jetson, Windows RTX, and data center GPUs.
Several model families are positioned for production and customization. DeepSeek and Kimi use mixture-of-experts architectures, Gemma 3n adds multilingual and multimodal support, Llama 4 is described as multimodal, Nemotron targets reasoning and agentic tasks, Phi focuses on small language models for single-GPU and edge environments, and Qwen3 offers hybrid reasoning across dense and MoE variants.
NVIDIA also highlights performance gains tied to Blackwell systems and optimized inference software. OpenAI’s gpt-oss models are described as optimized for 10x inference performance on NVIDIA Blackwell architecture, delivering up to 1.5 million tokens per second (TPS) on an NVIDIA GB200 NVL72 system, while Kimi K2 Thinking saw a 10x performance leap on NVIDIA GB200 NVL72 compared with NVIDIA HGX™ H200.
For next-generation agentic AI, NVIDIA says Blackwell Ultra delivers up to 50x better performance and 35x lower cost, supported by co-design across Blackwell, NVLink™, NVLink Switch, NVFP4, Dynamo, TensorRT™ LLM, and community frameworks including SGLang and vLLM.
