Adobe Research Empowers Video World Models with State-Space Memory

Adobe Research, in collaboration with Stanford and Princeton, pioneers long-term memory solutions for video world models, boosting Artificial Intelligence scene reasoning and planning.

Researchers from Adobe, Stanford, and Princeton have introduced a novel approach to overcoming the bottleneck of long-term memory in video world models, a core challenge hindering Artificial Intelligence agents´ ability to reason and plan in dynamic environments. While previous video diffusion models achieved high-quality frame prediction, their limited sequence memory due to computationally expensive attention mechanisms severely restricted their practical application in complex, real-world tasks.

The proposed solution, detailed in their paper ´Long-Context State-Space Video World Models,´ centers on incorporating State-Space Models (SSMs) in a block-wise fashion. By breaking video sequences into manageable blocks and maintaining a compressed state across these blocks, the Long-Context State-Space Video World Model (LSSVWM) significantly extends the model´s temporal memory without suffering from the quadratic scaling that plagues attention-based architectures. To retain spatial consistency within and across these blocks, the architecture combines dense local attention, ensuring that local fidelity and scene coherence are preserved throughout extended generations.

To further enhance performance, the research introduces two training strategies: diffusion forcing and frame local attention. Diffusion forcing encourages the model to preserve sequence consistency even from sparse initial contexts, while frame local attention leverages the FlexAttention technique for efficient chunked frame processing and faster training. These innovations were rigorously evaluated on demanding datasets such as Memory Maze and Minecraft, environments specifically designed to challenge long-term recall and reasoning capabilities. Experimental results demonstrate that LSSVWM substantially outperforms existing baselines, enabling coherent, accurate prediction over long horizons without sacrificing inference speed. These breakthroughs position the architecture as a promising foundation for interactive Artificial Intelligence video planning systems and dynamic scene understanding.

74

Impact Score

Intel unveils massive artificial intelligence processor test vehicle showcasing advanced packaging

Intel Foundry has revealed an experimental artificial intelligence chip test vehicle that uses an 8 reticle-sized package with multiple logic and memory tiles to demonstrate its latest manufacturing and packaging capabilities. The design highlights how Intel intends to build next-generation multi-chiplet artificial intelligence and high performance computing processors with advanced interconnects and power delivery.

Reward models inherit value biases from large language model foundations

New research shows that reward models used to align large language models inherit systematic value biases from their pre-trained foundations, with Llama and Gemma models diverging along agency and communion dimensions. The work raises fresh safety questions about treating base model choice as a purely technical performance decision in Artificial Intelligence alignment pipelines.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.