Meta and Ohio State unveil Early Experience for training language agents

Meta and Ohio State University introduce Early Experience, a self-directed training approach that lets language agents learn from their own interactions. In tests across eight environments, the method outperformed imitation learning and strengthened downstream reinforcement learning.

Researchers at Meta and Ohio State University have introduced Early Experience, a training approach for language agents that learns from the agent’s own actions rather than relying on external reward signals. Traditional systems often depend on human demonstrations that cover limited scenarios and struggle to generalize. Early Experience positions itself between imitation learning and reinforcement learning by turning the agent’s exploratory behavior into useful supervision without explicit rewards.

The work centers on two techniques. Implicit world modeling teaches an agent to predict what will happen after taking different actions, using those predictions as training targets. For example, when an agent clicks a website link, it learns to anticipate the resulting page. The second technique, self-reflection, has the agent compare its own actions with expert moves and generate natural language explanations for why the expert’s choice is superior, such as noting when an online shopping decision exceeds a budget. Both methods turn the agent’s own interactions and outcomes into learning signals, removing the need for outside evaluations.

The team evaluated Early Experience across eight environments, spanning website navigation, simulated household chores, scientific experiments, multi-step tool use, and complex planning tasks like travel arrangements. Using relatively small language models including Llama-3.1-8B, Llama-3.2-3B, and Qwen2.5-7B, both Early Experience methods consistently outperformed standard training. On average, success rates increased by 9.6 percentage points, with generalization to new scenarios improving by 9.4 percentage points. Gains were largest on harder problems: self-reflection improved travel planning by up to 15 percentage points, while implicit world modeling lifted online shopping by as much as 18.4 percentage points.

The researchers also tested whether Early Experience improves subsequent reinforcement learning. Models first trained with Early Experience and then run through the same reinforcement learning process outperformed those that started from other methods, sometimes widening the performance gap as training progressed. The results suggest that Early Experience is effective on its own and strengthens later reinforcement learning, offering a practical bridge between current training strategies and more reward-driven systems.

Early Experience scaled to larger models up to 70 billion parameters, and the improvements held even when using resource-efficient LoRA updates. It also showed strong data efficiency: in some tests, using just one eighth of the original expert demonstrations was enough to beat standard training with the full dataset. Together, these findings indicate that learning from early, self-generated interactions can build more capable and adaptable language agents while reducing reliance on extensive expert data and explicit reward signals.

55

Impact Score

Robotics special: Waymo heads across the pond

Waymo will bring its robotaxis to London in 2026, a high-stakes test for autonomous driving in one of the world’s toughest urban environments. This week’s robotics roundup also spotlights fresh hardware and consumer concepts powered by Artificial Intelligence across phones, homes, and labs.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.