NVIDIA TensorRT dramatically accelerates Stable Diffusion 3.5 on RTX GPUs

NVIDIA´s TensorRT brings major performance and memory efficiency gains to Stable Diffusion 3.5, enabling faster image generation with less VRAM on GeForce RTX and RTX PRO GPUs.

The rise of generative artificial intelligence models has transformed digital creativity but also increased hardware demands, especially for VRAM. Stable Diffusion 3.5 Large, a leading image generation model, originally required over 18GB of VRAM, restricting its usability to only a handful of high-end systems. NVIDIA has addressed this limitation by collaborating with Stability AI to quantize the model to FP8—a process that trims VRAM usage by 40%, making it possible for more systems to run the model effectively.

This optimization, combined with the power of NVIDIA TensorRT, generates significant performance leaps. TensorRT, NVIDIA’s AI inference platform, has been reengineered for RTX AI PCs and now offers just-in-time, on-device engine building. Alongside reduced VRAM use, quantized Stable Diffusion 3.5 models see performance more than double, with FP8 TensorRT delivering a 2.3x speedup and 40% lower memory use compared to baseline PyTorch implementations in BF16. Medium versions are also optimized, with a 1.7x increase in speed. These advances allow image creation and editing tasks to proceed quickly and in real time, even on RTX GPUs with less VRAM.

TensorRT for RTX is now available as a standalone software development kit, letting developers take advantage of rapid JIT engine creation without the need for training on device-specific packages. The SDK is eight times smaller than before and integrates with Windows ML for streamlined deployment. The optimized Stable Diffusion 3.5 models are live on Hugging Face and will soon be accessible as NVIDIA NIM microservices, further simplifying deployment for developers and creatives. Such coordinated advances mark a vital step in democratizing high-performance generative artificial intelligence, making advanced image synthesis accessible on a much wider range of systems.

72

Impact Score

Semiconductor revenue posts record growth in 1Q26

Semiconductor revenue grew 27% in 1Q26 from 4Q25, marking the strongest quarter-over-quarter increase Omdia has tracked. Memory revenue led the rise, while Artificial Intelligence-related demand and supply-demand imbalances remained key market forces.

Banking CISOs face artificial intelligence governance gap

Banking security leaders are moving quickly to formalize Artificial Intelligence oversight as business deployments and examiner scrutiny increase. Microsoft Copilot, agentic platforms, and third-party tools are turning governance gaps into operational risk.

Apple delays Siri Artificial Intelligence in EU amid DMA dispute

Apple says its redesigned Siri Artificial Intelligence will not launch on iPhones or iPads in the European Union under upcoming operating system releases. The company blames an unresolved dispute with regulators over DMA requirements and user privacy protections.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.