Dr. Fei-Fei Li published a new essay arguing that the next major advance in Artificial Intelligence will come from spatial intelligence: systems that can understand, reason about, and generate 3D, physics-consistent worlds. Li says large language models have mastered abstract knowledge but still lack the ability to perceive and act in space, including tasks like estimating distance and motion. She frames spatial understanding as the cognitive core of human intelligence and a necessary step to take Artificial Intelligence from language to real-world perception and action.
At the center of Li’s vision are world models that can create realistic 3D environments, interpret inputs such as images and actions, and predict how those environments evolve over time. She argues these capabilities will be essential for robotics and for applications across science, healthcare, and design. World models that understand object interactions and physics could one day help predict molecular reactions, model climate systems, or test materials. Li notes the technical challenge of teaching models real-world physics, but highlights momentum with her World Labs and efforts from companies including Google and Tencent to build spatially intelligent systems.
The newsletter places Li’s essay alongside other developments in Artificial Intelligence. Anthropic projects a major cost advantage over OpenAI by relying on a mix of chips from Amazon, Nvidia, and Google and expects to be cash flow positive by 2027. Microsoft Copilot Desktop’s Voice and Vision features can scan Google Sheets or Excel files, let users ask analysis questions by voice, and generate reports that highlight cells and explain calculations. Separately, GPT-5 became the first model to solve a full 9×9 Sudoku puzzle on Sakana AI’s Sudoku-Bench and achieved a 33 percent solve rate across puzzles, underlining progress in structured reasoning even as many puzzles remain unsolved.
