Meta and Ohio State unveil Early Experience for training language agents

Meta and Ohio State University introduce Early Experience, a self-directed training approach that lets language agents learn from their own interactions. In tests across eight environments, the method outperformed imitation learning and strengthened downstream reinforcement learning.

Researchers at Meta and Ohio State University have introduced Early Experience, a training approach for language agents that learns from the agent’s own actions rather than relying on external reward signals. Traditional systems often depend on human demonstrations that cover limited scenarios and struggle to generalize. Early Experience positions itself between imitation learning and reinforcement learning by turning the agent’s exploratory behavior into useful supervision without explicit rewards.

The work centers on two techniques. Implicit world modeling teaches an agent to predict what will happen after taking different actions, using those predictions as training targets. For example, when an agent clicks a website link, it learns to anticipate the resulting page. The second technique, self-reflection, has the agent compare its own actions with expert moves and generate natural language explanations for why the expert’s choice is superior, such as noting when an online shopping decision exceeds a budget. Both methods turn the agent’s own interactions and outcomes into learning signals, removing the need for outside evaluations.

The team evaluated Early Experience across eight environments, spanning website navigation, simulated household chores, scientific experiments, multi-step tool use, and complex planning tasks like travel arrangements. Using relatively small language models including Llama-3.1-8B, Llama-3.2-3B, and Qwen2.5-7B, both Early Experience methods consistently outperformed standard training. On average, success rates increased by 9.6 percentage points, with generalization to new scenarios improving by 9.4 percentage points. Gains were largest on harder problems: self-reflection improved travel planning by up to 15 percentage points, while implicit world modeling lifted online shopping by as much as 18.4 percentage points.

The researchers also tested whether Early Experience improves subsequent reinforcement learning. Models first trained with Early Experience and then run through the same reinforcement learning process outperformed those that started from other methods, sometimes widening the performance gap as training progressed. The results suggest that Early Experience is effective on its own and strengthens later reinforcement learning, offering a practical bridge between current training strategies and more reward-driven systems.

Early Experience scaled to larger models up to 70 billion parameters, and the improvements held even when using resource-efficient LoRA updates. It also showed strong data efficiency: in some tests, using just one eighth of the original expert demonstrations was enough to beat standard training with the full dataset. Together, these findings indicate that learning from early, self-generated interactions can build more capable and adaptable language agents while reducing reliance on extensive expert data and explicit reward signals.

55

Impact Score

Artificial Intelligence divides employers as hiring and headcount shift

U.S. hiring beat expectations in April, but employers remain split on whether Artificial Intelligence should drive layoffs, productivity gains, or internal redeployment. At the same time, candidate use of Artificial Intelligence is outpacing employer adoption in hiring, adding new pressure to screening and entry-level recruiting.

What businesses need to know about the EU cyber resilience act

The EU cyber resilience act is turning product cybersecurity into a legal requirement for companies that sell digital products into the European Union. A key compliance milestone arrives in September 2026, well before the full regulation takes effect in 2027.

Claude Mythos and cyber insurance’s next inflection point

Claude Mythos is being treated by governments and regulators as a potential systemic cyber risk with implications for financial stability and insurance markets. Its emergence is intensifying pressure on insurers to clarify whether Artificial Intelligence-enabled cyber losses are covered, excluded, or require new stand-alone products.

OpenAI expands ChatGPT ads with self-serve manager

OpenAI is widening its ChatGPT ads pilot with a beta self-serve Ads Manager, new bidding options and broader measurement tools. The push signals a deeper move into advertising as the company expands the program into several international markets.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.