NVIDIA´s artificial intelligence platform offers developers access to a curated selection of community-built, open models—each optimized to leverage NVIDIA GPUs and advanced inference technologies. The collection showcases high-performing large language models (LLMs) and small language models (SLMs) from major contributors such as Meta (Llama), DeepSeek, Google DeepMind (Gemma), and Microsoft (Phi), with deployment pathways across data centers, edge, and consumer devices.
The Llama suite, developed by Meta, features scalable open models, now supporting multimodal capabilities with the 2025 release of Llama 4. NVIDIA enhances Llama inference with TensorRT-LLM, maximizing throughput on GPUs like Blackwell and Hopper. Optimized Llama models are available as NVIDIA NIM microservices for seamless deployment. Developers can further tailor Llama using the NeMo framework or deploy variants via Ollama and Hugging Face, with quantized and production-ready versions accessible for various hardware.
DeepSeek delivers powerful open-source models utilizing a mixture-of-experts architecture, excelling in advanced reasoning. These models are tailored for performance using TensorRT-LLM in data centers and come with options for rapid prototyping through NVIDIA NIM. Fine-tuning and quantization via NeMo or TensorRT Model Optimizer enable efficient deployment on both consumer and enterprise-grade devices. Ollama provides streamlined deployment of DeepSeek models, while Hugging Face hosts quantized variants ready for integration into custom workflows.
Gemma, from Google DeepMind, is a family of lightweight open models catering to diverse tasks in text, image, video, and audio. NVIDIA collaborates to ensure optimal execution on NVIDIA hardware, from data center GPUs to RTX and Jetson devices. The Gemma 3n release brings native multilingual and multimodal support, with enterprise-ready containers through NIM and customization via NeMo. Integration with Ollama and Hugging Face simplifies experimentation and deployment, and tools like TensorRT-LLM ensure high-performance inference on supported platforms.
Microsoft´s Phi models comprise SLMs tuned for resource efficiency while delivering strong results in reasoning, code generation, summarization, and more. Phi is deployable on single GPU systems for Windows and Jetson, with the latest Phi-4 series adding multimodal abilities. The NVIDIA ecosystem supports deployment, optimization, and customization—including containers from Jetson AI Lab, the NeMo framework, and quantization resources on NGC and Hugging Face.
NVIDIA emphasizes ethical artificial intelligence development, providing guidelines and developer resources to encourage the responsible use and refinement of community models. Developers can access comprehensive documentation, tutorials, and workflow examples to expedite adoption and integration for a broad range of commercial and research use cases.