NVIDIA is positioning RTX PCs and DGX Spark as systems for running personal agents locally, with an emphasis on private, always-on assistants that can access personal files, apps, and workflows without relying on cloud inference. At GTC, the company highlighted a set of updates spanning open models, an open source OpenClaw stack, and simplified fine-tuning tools designed to make local agent deployment more practical for developers and enthusiasts.
The new model lineup centers on local inference across NVIDIA hardware. DGX Spark is described as especially suited to larger agentic workloads, with its 128GB of unified memory that supports models with more than 120 billion parameters. Nemotron 3 Super is a 120-billion-parameter open model with 12 billion active parameters, designed to run complex agentic Artificial Intelligence systems. On PinchBench, Nemotron 3 Super scored 85.6%, making it the top open model in its class. Mistral Small 4, a 119-billion-parameter open model with 6 billion active parameters, 8 billion including all layers, is positioned for general chat, coding and agentic tasks. For smaller systems, Nemotron 3 Nano 4B targets GeForce RTX users building local assistants and conversational personas on constrained hardware. NVIDIA also announced optimizations for Alibaba’s Qwen 3.5 models, including 27B, 9B and 4B. The new models natively support vision, multi-token prediction and a large 262,000-token context window. The dense 27-billion-parameter model excels when paired with an RTX 5090 GPU.
NVIDIA also introduced NemoClaw, an open source stack for OpenClaw that brings NVIDIA-specific optimizations to local agents. The initial release includes NVIDIA Nemotron open models and the NVIDIA OpenShell runtime. NVIDIA says local inference through Nemotron models improves privacy and eliminates token costs, while OpenShell is designed to execute claws more safely. At the same time, the company promoted a hands-on build-a-claw event running through March 19, from 8 a.m.-5 p.m., where attendees can configure and deploy a proactive assistant on their device of choice.
Beyond agents, NVIDIA highlighted RTX-optimized creative and model-tuning tools. LTX 2.3 now supports NVFP4 and FP8 distilled models, accelerating performance by 2.1x. FLUX.2 Klein 9B received an update that accelerates image editing by up to 2x, alongside a new FP8 version optimized for RTX GPUs. Unsloth Studio launched as a web-based interface for fine-tuning, with support for more than 500 AI models. Built on the Unsloth library, it delivers up to 2x faster training with up to 70% VRAM savings, giving RTX GPU and DGX Spark users a simpler path to customizing local models for agentic workflows.
