nvidia nemotron 3 super targets agentic artificial intelligence at scale

March 12, 2026

nvidia nemotron 3 super is a 120‑billion‑parameter open model with 12 billion active parameters, engineered to power large scale agentic artificial intelligence systems with high throughput and accuracy. a hybrid mixture of experts architecture, 1‑million‑token context window and open weights position it for use across enterprise, research and autonomous agent workflows.

nvidia has launched nemotron 3 super, a 120‑billion‑parameter open model with 12 billion active parameters designed to run complex agentic artificial intelligence systems at scale. the model combines advanced reasoning capabilities to efficiently complete tasks with high accuracy for autonomous agents and is already being integrated by artificial intelligence native companies such as perplexity for search and as one of 20 orchestrated models in computer. software development agent providers including coderabbit, factory and greptile are adopting the model alongside proprietary systems to raise accuracy while lowering cost, while life sciences groups such as edison scientific and lila sciences plan to use it for deep literature search, data science and molecular understanding.

enterprise software vendors including amdocs, palantir, cadence, dassault systèmes and siemens are deploying and customizing nemotron 3 super to automate workflows in telecom, cybersecurity, semiconductor design and manufacturing. multi agent applications face context explosion, where workflows can generate up to 15x more tokens than standard chat because each interaction requires resending full histories, and they also suffer from a thinking tax when large models are used at every step. nemotron 3 super addresses these issues with a 1‑million‑token context window that allows agents to retain full workflow state in memory and prevent goal drift. the model powers the nvidia artificial intelligence q research agent, which has reached the no. 1 position on the deepresearch bench and deepresearch bench II leaderboards for multistep research over large document sets.

nemotron 3 super uses a hybrid mixture of experts architecture that combines mamba and transformer layers to deliver up to 5x higher throughput and up to 2x higher accuracy than the previous nemotron super model. mamba layers are described as delivering 4x higher memory and compute efficiency, while only 12 billion of the 120 billion parameters are active at inference, and a latent mixture of experts technique activates four expert specialists for the cost of one, alongside multi token prediction that provides 3x faster inference. on the nvidia blackwell platform, the model runs in NVFP4 precision, which cuts memory requirements and pushes inference up to 4x faster than FP8 on nvidia hopper with no loss in accuracy. nvidia is releasing nemotron 3 super with open weights under a permissive license and publishing a training methodology that includes over 10 trillion tokens of pre and post training datasets and 15 training environments for reinforcement learning, while packaging the model as an nvidia nim microservice deployable from on premises systems to major clouds and partner platforms such as perplexity, openrouter, hugging face, hyperscale cloud providers, specialized artificial intelligence infrastructure clouds and data platform vendors.

Source

72

Impact Score

Latest News

UK under-16 social media crackdown to proceed despite US opposition

June 11, 2026

White House displeasure over the prospect of an under-16 social media ban will not deter the UK from cracking down on tech platforms. Liz Kendall said her priority was “British young people” as ministers prepare restrictions on social media, gaming platforms and Artificial Intelligence chatbots.

Anthropic limits Mythos models on Artificial Intelligence research tasks

June 11, 2026

Anthropic disclosed that its Mythos-based models can become less helpful on frontier large language model development work. Developers and researchers criticized the invisible limitations, arguing that degraded assistance without notice undermines trust.

DFlash accelerates large language model inference with block diffusion

June 11, 2026

DFlash uses block-diffusion speculative decoding to reduce large language model inference latency while keeping the target model as verifier. The workflow covers draft-model training, FlashAttention integration, and deployment through Regolo Custom Models.

NVIDIA speeds Google DeepMind DiffusionGemma for local Artificial Intelligence

June 11, 2026

Google DeepMind’s DiffusionGemma uses diffusion-style parallel text generation instead of token-by-token output. NVIDIA says its optimizations make the open model faster across local RTX, RTX PRO and DGX systems.

NVIDIA outlines Halos safety foundation for robotaxis

June 11, 2026

NVIDIA is positioning Halos OS as a production-ready safety layer for robotaxi deployments built on DRIVE Hyperion. The system combines certified software, standardized interfaces, verifiable Artificial Intelligence guardrails and large-scale validation tools.

nvidia nemotron 3 super targets agentic artificial intelligence at scale

72

Impact Score

Latest News

UK under-16 social media crackdown to proceed despite US opposition

Anthropic limits Mythos models on Artificial Intelligence research tasks

DFlash accelerates large language model inference with block diffusion

NVIDIA speeds Google DeepMind DiffusionGemma for local Artificial Intelligence

NVIDIA outlines Halos safety foundation for robotaxis

Contact Us