NVIDIA TensorRT dramatically accelerates Stable Diffusion 3.5 on RTX GPUs

NVIDIA´s TensorRT brings major performance and memory efficiency gains to Stable Diffusion 3.5, enabling faster image generation with less VRAM on GeForce RTX and RTX PRO GPUs.

The rise of generative artificial intelligence models has transformed digital creativity but also increased hardware demands, especially for VRAM. Stable Diffusion 3.5 Large, a leading image generation model, originally required over 18GB of VRAM, restricting its usability to only a handful of high-end systems. NVIDIA has addressed this limitation by collaborating with Stability AI to quantize the model to FP8—a process that trims VRAM usage by 40%, making it possible for more systems to run the model effectively.

This optimization, combined with the power of NVIDIA TensorRT, generates significant performance leaps. TensorRT, NVIDIA’s AI inference platform, has been reengineered for RTX AI PCs and now offers just-in-time, on-device engine building. Alongside reduced VRAM use, quantized Stable Diffusion 3.5 models see performance more than double, with FP8 TensorRT delivering a 2.3x speedup and 40% lower memory use compared to baseline PyTorch implementations in BF16. Medium versions are also optimized, with a 1.7x increase in speed. These advances allow image creation and editing tasks to proceed quickly and in real time, even on RTX GPUs with less VRAM.

TensorRT for RTX is now available as a standalone software development kit, letting developers take advantage of rapid JIT engine creation without the need for training on device-specific packages. The SDK is eight times smaller than before and integrates with Windows ML for streamlined deployment. The optimized Stable Diffusion 3.5 models are live on Hugging Face and will soon be accessible as NVIDIA NIM microservices, further simplifying deployment for developers and creatives. Such coordinated advances mark a vital step in democratizing high-performance generative artificial intelligence, making advanced image synthesis accessible on a much wider range of systems.

72

Impact Score

Introducing Mistral 3: open artificial intelligence models

Mistral 3 is a family of open, multimodal and multilingual Artificial Intelligence models that includes three Ministral edge models and a sparse Mistral Large 3 trained with 41B active and 675B total parameters, released under the Apache 2.0 license.

NVIDIA and Mistral Artificial Intelligence partner to accelerate new family of open models

NVIDIA and Mistral Artificial Intelligence announced a partnership to optimize the Mistral 3 family of open-source multilingual, multimodal models across NVIDIA supercomputing and edge platforms. The collaboration highlights Mistral Large 3, a mixture-of-experts model designed to improve efficiency and accuracy for enterprise artificial intelligence deployments starting Tuesday, Dec. 2.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.