Meta and Ohio State unveil Early Experience for training language agents

October 20, 2025

Meta and Ohio State University introduce Early Experience, a self-directed training approach that lets language agents learn from their own interactions. In tests across eight environments, the method outperformed imitation learning and strengthened downstream reinforcement learning.

Researchers at Meta and Ohio State University have introduced Early Experience, a training approach for language agents that learns from the agent’s own actions rather than relying on external reward signals. Traditional systems often depend on human demonstrations that cover limited scenarios and struggle to generalize. Early Experience positions itself between imitation learning and reinforcement learning by turning the agent’s exploratory behavior into useful supervision without explicit rewards.

The work centers on two techniques. Implicit world modeling teaches an agent to predict what will happen after taking different actions, using those predictions as training targets. For example, when an agent clicks a website link, it learns to anticipate the resulting page. The second technique, self-reflection, has the agent compare its own actions with expert moves and generate natural language explanations for why the expert’s choice is superior, such as noting when an online shopping decision exceeds a budget. Both methods turn the agent’s own interactions and outcomes into learning signals, removing the need for outside evaluations.

The team evaluated Early Experience across eight environments, spanning website navigation, simulated household chores, scientific experiments, multi-step tool use, and complex planning tasks like travel arrangements. Using relatively small language models including Llama-3.1-8B, Llama-3.2-3B, and Qwen2.5-7B, both Early Experience methods consistently outperformed standard training. On average, success rates increased by 9.6 percentage points, with generalization to new scenarios improving by 9.4 percentage points. Gains were largest on harder problems: self-reflection improved travel planning by up to 15 percentage points, while implicit world modeling lifted online shopping by as much as 18.4 percentage points.

The researchers also tested whether Early Experience improves subsequent reinforcement learning. Models first trained with Early Experience and then run through the same reinforcement learning process outperformed those that started from other methods, sometimes widening the performance gap as training progressed. The results suggest that Early Experience is effective on its own and strengthens later reinforcement learning, offering a practical bridge between current training strategies and more reward-driven systems.

Early Experience scaled to larger models up to 70 billion parameters, and the improvements held even when using resource-efficient LoRA updates. It also showed strong data efficiency: in some tests, using just one eighth of the original expert demonstrations was enough to beat standard training with the full dataset. Together, these findings indicate that learning from early, self-generated interactions can build more capable and adaptable language agents while reducing reliance on extensive expert data and explicit reward signals.

Source

55

Impact Score

Latest News

Samsung teases Exynos 2600 5G, insiders point to regional exclusivity

December 6, 2025

Samsung posted a 30-second-long teaser for its Exynos 2600 5G SoC as reports tie the chip to a 2 nm GAA process and suggest it may be used only in select regions while Qualcomm supplies other markets.

AWS introduces Graviton5, its most powerful and efficient CPU

December 6, 2025

AWS has announced Graviton5 processors, its latest custom chip that promises up to 25% better compute performance and improved energy efficiency for a broad set of cloud workloads.

Artificial Intelligence LLM confessions and geothermal hot spots

December 5, 2025

OpenAI is testing a method that prompts large language models to produce confessions explaining how they completed tasks and acknowledging misconduct, part of efforts to make multitrillion-dollar Artificial Intelligence systems more trustworthy. Separately, startups are using Artificial Intelligence to locate blind geothermal systems and energy observers note seasonal patterns in nuclear reactor operations.

Artificial Intelligence chatbots can sway voters better than political advertisements

December 5, 2025

New research finds a single conversation with an Artificial Intelligence chatbot can shift voter preferences more than political advertisements, though the most persuasive models often produce inaccurate claims.

A surveillance mandate disguised as child safety: why the GUARD Act won’t keep us safe

December 5, 2025

The GUARD Act would force many companies offering Artificial Intelligence chatbots to verify users’ ages, bar minors, and impose criminal penalties, but the bill’s age-gating and data rules risk mass surveillance, censorship, and lost access to everyday tools.

Meta and Ohio State unveil Early Experience for training language agents

55

Impact Score

Latest News

Samsung teases Exynos 2600 5G, insiders point to regional exclusivity

AWS introduces Graviton5, its most powerful and efficient CPU

Artificial Intelligence LLM confessions and geothermal hot spots

Artificial Intelligence chatbots can sway voters better than political advertisements

A surveillance mandate disguised as child safety: why the GUARD Act won’t keep us safe

Contact Us