google deepmind has built SIMA 2, a new version of its scalable instructable multiworld agent, by integrating Gemini, the firm’s flagship large language model. the team says Gemini gives the agent improved instruction following, the ability to ask questions and provide updates, and better self-improvement through trial and error. SIMA 2 processes a game’s pixels frame by frame and maps them to keyboard and mouse actions after being trained on footage of humans playing eight commercial games, including No Man’s Sky and Goat Simulator 3, plus three company-created virtual worlds.
humans can interact with SIMA 2 via text chat, voice, or drawing on the screen, and the agent can chat back with users. researchers connected SIMA 2 to Genie 3, the latest version of the company’s world model, to generate novel environments and tasks. in experiments where Genie 3 produced environments from scratch, SIMA 2 was able to navigate and carry out instructions in previously unseen settings. Gemini was also used to generate hints when SIMA 2 failed tasks; by retrying with those tips, the agent often improved through repeated attempts, demonstrating a form of trial and error learning the team hopes to scale into an endless virtual training dojo.
SIMA 2 remains an experiment with clear limits. researchers trimmed long-term memory to make the agent more responsive, so it currently remembers only recent interactions, and it struggles with complex, multi-step tasks and with precise keyboard and mouse control compared with humans. commentators noted both promise and caveats: julian togelius highlighted how hard real-time visual control across games is and pointed to past transfer issues with systems like GATO, while matthew guzdial warned that similar game inputs and game-designed visuals may overstate transferability to real-world robots. despite these limits, google deepmind said the navigation, tool use, and human collaboration skills SIMA 2 exhibits are foundational for future robot companions, and the team plans continued work with Genie 3 and Gemini to expand training scenarios.
