March 2026 brings a wave of Artificial Intelligence model launches

A concentrated burst of model releases in March 2026 highlighted rapid gains in language, video, coding, and on-device systems. Open-weight and open-source entrants narrowed the gap with proprietary platforms across several important benchmarks.

The first weeks of March 2026 brought a dense cluster of major Artificial Intelligence releases across the US, China, and Europe, spanning language models, video generation, spatial reasoning, GPU tooling, and diffusion acceleration. The lineup included GPT-5.4, LTX 2.3, FireRed Edit 1.1, Kiwi Edit, HY WU, Qwen 3.5 Small Series, CUDA Agent, CubeComposer, Helios, Spatial T2I, Spectrum, Utonia, and more, with NVIDIA adding Nemotron 3 Super on March 11. The central shift was not just volume. Open-weight and open-source systems moved much closer to proprietary frontier models on several headline benchmarks, while also improving cost efficiency and deployability.

OpenAI introduced GPT-5.4 on March 5, 2026 in Standard, Thinking, and Pro variants, with context windows up to 1.05 million tokens. On factual accuracy, GPT-5.4 reduces individual claim errors by 33% and full-response errors by 18% compared to GPT-5.2. It scored 83% on OpenAI’s GDPval benchmark for knowledge work. For coding specifically, it hits 57.7% on SWE-Bench Pro, just above GPT-5.3-Codex’s 56.8%, with lower latency. Pricing: $2.50 per 1M input tokens and $15.00 per 1M output tokens for standard context. There’s a 2x surcharge beyond 272K tokens. A notable addition is Tool Search, which dynamically retrieves tool definitions instead of loading them all into the prompt, reducing cost and latency for complex agentic systems.

Alibaba’s Qwen 3.5 Small family arrived on March 1, 2026 with 0.8B, 2B, 4B, and 9B parameter models, all natively multimodal and licensed under Apache 2.0. The 9B model led the release narrative. On GPQA Diamond, it scores 81.7 versus GPT-OSS-120B’s 71.5. On HMMT Feb 2025, it hits 83.2 versus GPT-OSS-120B’s 76.7. On MMLU-Pro, it reaches 82.5 versus 80.8. On Video-MME with subtitles, the 9B scores 84.5, significantly ahead of Gemini 2.5 Flash-Lite at 74.6. The architecture combines Gated Delta Networks with sparse Mixture-of-Experts, enabling a 262K native context window, extensible to 1M via YaRN. The 2B model runs on an iPhone in airplane mode with 4 GB of RAM. Qwen 3.5 via API costs approximately $0.10 per 1M input tokens, versus Claude Opus 4.6 at roughly 13x that price.

Open-source video also advanced sharply. Lightricks released LTX 2.3, a 22-billion-parameter Diffusion Transformer that generates synchronized video and audio in a single forward pass, supports up to 4K at 50 FPS, and runs up to 20 seconds of video. The distilled variant runs in just 8 denoising steps. Helios, developed by Peking University, ByteDance, and Canva, is a 14-billion-parameter autoregressive diffusion model that generates videos up to 1,440 frames (approximately 60 seconds at 24 FPS) at 19.5 FPS on a single NVIDIA H100 GPU under Apache 2.0. NVIDIA’s Nemotron 3 Super added a strong enterprise coding option: a 120-billion-total-parameter model with 12 billion active parameters per forward pass. Nemotron 3 Super scores 60.47% on SWE-Bench Verified, versus GPT-OSS’s 41.90%. On RULER at 1M tokens, it scores 91.75% versus GPT-OSS’s 22.30%. It delivers 2.2x higher throughput than GPT-OSS-120B and 7.5x higher throughput than Qwen3.5-122B. The broader takeaway is that efficient, open, and edge-deployable systems are becoming viable foundations for products in coding, video, and local inference.

70

Impact Score

NVIDIA renames Maxine to NVIDIA Artificial Intelligence for Media

NVIDIA Maxine has been renamed NVIDIA Artificial Intelligence for Media, a development platform for audio, video, and augmented reality workflows. The platform combines SDKs and cloud-native microservices for real-time media enhancement across local, cloud, and edge deployments.

NVIDIA groq 3 LPX targets low-latency Artificial Intelligence inference

NVIDIA positions Groq 3 LPX as an inference accelerator for Vera Rubin built to handle low-latency, large-context workloads for agentic systems. The platform combines Rubin GPUs and LPUs in a co-designed architecture aimed at boosting throughput, token generation, and efficiency at rack scale.

Nvidia sets the stage for GTC 2026 keynote

Nvidia is preparing to outline its next wave of computing, networking, and rendering plans at GTC 2026, with Jensen Huang leading the keynote. The event is expected to focus on next-generation platforms, broader Artificial Intelligence infrastructure, and the company’s expanding partnership with Intel.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.