nvidia nemotron 3 super targets agentic artificial intelligence at scale

nvidia nemotron 3 super is a 120‑billion‑parameter open model with 12 billion active parameters, engineered to power large scale agentic artificial intelligence systems with high throughput and accuracy. a hybrid mixture of experts architecture, 1‑million‑token context window and open weights position it for use across enterprise, research and autonomous agent workflows.

nvidia has launched nemotron 3 super, a 120‑billion‑parameter open model with 12 billion active parameters designed to run complex agentic artificial intelligence systems at scale. the model combines advanced reasoning capabilities to efficiently complete tasks with high accuracy for autonomous agents and is already being integrated by artificial intelligence native companies such as perplexity for search and as one of 20 orchestrated models in computer. software development agent providers including coderabbit, factory and greptile are adopting the model alongside proprietary systems to raise accuracy while lowering cost, while life sciences groups such as edison scientific and lila sciences plan to use it for deep literature search, data science and molecular understanding.

enterprise software vendors including amdocs, palantir, cadence, dassault systèmes and siemens are deploying and customizing nemotron 3 super to automate workflows in telecom, cybersecurity, semiconductor design and manufacturing. multi agent applications face context explosion, where workflows can generate up to 15x more tokens than standard chat because each interaction requires resending full histories, and they also suffer from a thinking tax when large models are used at every step. nemotron 3 super addresses these issues with a 1‑million‑token context window that allows agents to retain full workflow state in memory and prevent goal drift. the model powers the nvidia artificial intelligence q research agent, which has reached the no. 1 position on the deepresearch bench and deepresearch bench II leaderboards for multistep research over large document sets.

nemotron 3 super uses a hybrid mixture of experts architecture that combines mamba and transformer layers to deliver up to 5x higher throughput and up to 2x higher accuracy than the previous nemotron super model. mamba layers are described as delivering 4x higher memory and compute efficiency, while only 12 billion of the 120 billion parameters are active at inference, and a latent mixture of experts technique activates four expert specialists for the cost of one, alongside multi token prediction that provides 3x faster inference. on the nvidia blackwell platform, the model runs in NVFP4 precision, which cuts memory requirements and pushes inference up to 4x faster than FP8 on nvidia hopper with no loss in accuracy. nvidia is releasing nemotron 3 super with open weights under a permissive license and publishing a training methodology that includes over 10 trillion tokens of pre and post training datasets and 15 training environments for reinforcement learning, while packaging the model as an nvidia nim microservice deployable from on premises systems to major clouds and partner platforms such as perplexity, openrouter, hugging face, hyperscale cloud providers, specialized artificial intelligence infrastructure clouds and data platform vendors.

72

Impact Score

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.