Meta and Ohio State unveil Early Experience for training language agents

Meta and Ohio State University introduce Early Experience, a self-directed training approach that lets language agents learn from their own interactions. In tests across eight environments, the method outperformed imitation learning and strengthened downstream reinforcement learning.

Researchers at Meta and Ohio State University have introduced Early Experience, a training approach for language agents that learns from the agent’s own actions rather than relying on external reward signals. Traditional systems often depend on human demonstrations that cover limited scenarios and struggle to generalize. Early Experience positions itself between imitation learning and reinforcement learning by turning the agent’s exploratory behavior into useful supervision without explicit rewards.

The work centers on two techniques. Implicit world modeling teaches an agent to predict what will happen after taking different actions, using those predictions as training targets. For example, when an agent clicks a website link, it learns to anticipate the resulting page. The second technique, self-reflection, has the agent compare its own actions with expert moves and generate natural language explanations for why the expert’s choice is superior, such as noting when an online shopping decision exceeds a budget. Both methods turn the agent’s own interactions and outcomes into learning signals, removing the need for outside evaluations.

The team evaluated Early Experience across eight environments, spanning website navigation, simulated household chores, scientific experiments, multi-step tool use, and complex planning tasks like travel arrangements. Using relatively small language models including Llama-3.1-8B, Llama-3.2-3B, and Qwen2.5-7B, both Early Experience methods consistently outperformed standard training. On average, success rates increased by 9.6 percentage points, with generalization to new scenarios improving by 9.4 percentage points. Gains were largest on harder problems: self-reflection improved travel planning by up to 15 percentage points, while implicit world modeling lifted online shopping by as much as 18.4 percentage points.

The researchers also tested whether Early Experience improves subsequent reinforcement learning. Models first trained with Early Experience and then run through the same reinforcement learning process outperformed those that started from other methods, sometimes widening the performance gap as training progressed. The results suggest that Early Experience is effective on its own and strengthens later reinforcement learning, offering a practical bridge between current training strategies and more reward-driven systems.

Early Experience scaled to larger models up to 70 billion parameters, and the improvements held even when using resource-efficient LoRA updates. It also showed strong data efficiency: in some tests, using just one eighth of the original expert demonstrations was enough to beat standard training with the full dataset. Together, these findings indicate that learning from early, self-generated interactions can build more capable and adaptable language agents while reducing reliance on extensive expert data and explicit reward signals.

55

Impact Score

Microsoft launches Copilot Health in the US

Microsoft has introduced Copilot Health as a protected space inside Copilot that combines medical records, wearable data and lab results into personalised health insights. The service is launching first for adults in the US with strong privacy controls and a limited initial rollout.

Tesla plans terafab for Artificial Intelligence chips

Tesla is moving toward a large-scale chip manufacturing project to support its autonomous driving roadmap. Elon Musk said the terafab effort for Artificial Intelligence chips will launch in seven days and may involve Intel, TSMC and Samsung.

Timeline traces evolution, civilisation and planetary stewardship

A sweeping chronology links cosmology, evolution, human history and modern environmental risk in a single long view of the human condition. The sequence culminates in contemporary debates over climate change, biodiversity loss and artificial intelligence governance.

Wolters Kluwer report tracks Artificial Intelligence shift in legal work

Wolters Kluwer’s 2026 Future Ready Lawyer findings show Artificial Intelligence has become a foundational tool across law firms and corporate legal departments. The survey points to measurable time savings, revenue growth, and rising pressure to strengthen training, ethics, and security.

Anthropic March 2026 release roundup

Anthropic rolled out a broad set of March 2026 updates across Claude Code, the Claude Developer Platform, Claude apps, and enterprise partnerships. Changes focused on larger context windows, workflow improvements, reliability fixes, visual output features, and new partner enablement programs.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.