Nvidia Research has introduced DiffusionRenderer, a pioneering neural rendering technique designed to manipulate scene lighting in videos using Artificial Intelligence. Unlike traditional physically based rendering pipelines that require 3D geometry data to accurately model light, DiffusionRenderer works directly from 2D video footage. This approach allows creators to seamlessly transition daylight scenes to nighttime, shift sunny afternoons to overcast environments, and soften harsh artificial lighting—all grounded in robust estimations of surface properties like normals, metallicity, and roughness.
DiffusionRenderer unites the processes of inverse rendering and forward rendering, outperforming previous state-of-the-art methods. This unified engine has wide-reaching implications across multiple sectors. In creative industries such as advertising, filmmaking, and video game development, artists can now add, remove, or adjust lighting within both real-world and Artificial Intelligence-generated videos, facilitating quick ideation and production planning without costly specialized equipment. In the sphere of physical Artificial Intelligence development, including robotics and autonomous vehicles, DiffusionRenderer enables the creation and augmentation of synthetic datasets. This aids in training models to cope with varied and challenging lighting conditions by generating a diverse array of relit video clips from limited source material.
Since its initial presentation, Nvidia´s team has enhanced DiffusionRenderer by integrating it with Cosmos Predict-1—a suite of world foundation models focused on physics-aware video generation. This integration harnessed larger diffusion models, which improved the fidelity, sharpness, and temporal consistency of both de-lighting and relighting results, demonstrating notable scaling effects. Cosmos Predict is part of the larger Nvidia Cosmos platform, offering tools for accelerated synthetic data curation and generation, specifically to advance physical Artificial Intelligence applications.
DiffusionRenderer is among more than sixty Nvidia papers showcased at this year´s Computer Vision and Pattern Recognition (CVPR) conference, which also highlights research spanning automotive, healthcare, robotics, and foundational Artificial Intelligence. Notably, Nvidia clinched the Autonomous Grand Challenge award for the third year, reinforcing its leadership in end-to-end autonomous technologies. Additional highlighted CVPR papers include FoundationStereo, offering 3D reconstruction from stereo images, Zero-Shot Monocular Scene Flow Estimation for robust 3D motion prediction, and Difix3D+, improving artifact removal in 3D scene reconstructions. These contributions reflect Nvidia´s continued push at the intersection of Artificial Intelligence research, computer vision, and real-world applications.