Adobe Research Empowers Video World Models with State-Space Memory

Adobe Research, in collaboration with Stanford and Princeton, pioneers long-term memory solutions for video world models, boosting Artificial Intelligence scene reasoning and planning.

Researchers from Adobe, Stanford, and Princeton have introduced a novel approach to overcoming the bottleneck of long-term memory in video world models, a core challenge hindering Artificial Intelligence agents´ ability to reason and plan in dynamic environments. While previous video diffusion models achieved high-quality frame prediction, their limited sequence memory due to computationally expensive attention mechanisms severely restricted their practical application in complex, real-world tasks.

The proposed solution, detailed in their paper ´Long-Context State-Space Video World Models,´ centers on incorporating State-Space Models (SSMs) in a block-wise fashion. By breaking video sequences into manageable blocks and maintaining a compressed state across these blocks, the Long-Context State-Space Video World Model (LSSVWM) significantly extends the model´s temporal memory without suffering from the quadratic scaling that plagues attention-based architectures. To retain spatial consistency within and across these blocks, the architecture combines dense local attention, ensuring that local fidelity and scene coherence are preserved throughout extended generations.

To further enhance performance, the research introduces two training strategies: diffusion forcing and frame local attention. Diffusion forcing encourages the model to preserve sequence consistency even from sparse initial contexts, while frame local attention leverages the FlexAttention technique for efficient chunked frame processing and faster training. These innovations were rigorously evaluated on demanding datasets such as Memory Maze and Minecraft, environments specifically designed to challenge long-term recall and reasoning capabilities. Experimental results demonstrate that LSSVWM substantially outperforms existing baselines, enabling coherent, accurate prediction over long horizons without sacrificing inference speed. These breakthroughs position the architecture as a promising foundation for interactive Artificial Intelligence video planning systems and dynamic scene understanding.

74

Impact Score

IBM and AMD partner on quantum-centric supercomputing

IBM and AMD announced plans to develop quantum-centric supercomputing architectures that combine quantum computers with high-performance computing to create scalable, open-source platforms. The collaboration leverages IBM´s work on quantum computers and software and AMD´s expertise in high-performance computing and Artificial Intelligence accelerators.

Qualcomm launches Dragonwing Q-6690 with integrated RFID and Artificial Intelligence

Qualcomm announced the Dragonwing Q-6690, billed as the world’s first enterprise mobile processor with fully integrated UHF RFID and built-in 5G, Wi-Fi 7, Bluetooth 6.0, ultra-wideband and Artificial Intelligence capabilities. The platform is aimed at rugged handhelds, point-of-sale systems and smart kiosks and offers software-configurable feature packs that can be upgraded over the air.

Recent books from the MIT community

A roundup of new titles from the MIT community, including Empire of Artificial Intelligence, a critical look at Sam Altman’s OpenAI, and Data, Systems, and Society, a textbook on harnessing Artificial Intelligence for societal good.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.