MIT researchers present SEAL, advancing self-improving language models

MIT unveils SEAL, a novel framework that lets large language models self-edit and adapt using reinforcement learning—pushing the frontier of self-improving Artificial Intelligence.

MIT researchers have introduced SEAL (Self-Adapting LLMs), a groundbreaking approach that empowers large language models to autonomously update their own parameters. The new framework, detailed in the paper ´Self-Adapting Language Models´, centers around the concept of self-generated data: the model creates and applies its own training samples—or self-edits—through carefully designed reinforcement learning loops. By tying performance rewards to downstream tasks, the model learns which self-edits are most beneficial for continuous improvement.

The SEAL method operates through a nested structure. The outer loop uses reinforcement learning to guide the generation of effective self-edits, while the inner loop updates the model using supervised fine-tuning based on these edits. Initially, researchers observed instability with standard policy optimization methods, ultimately favoring a more robust behavioral cloning strategy (ReST^EM) inspired by work at DeepMind. This process filters self-edits based on observed performance gains before incorporating them. While the current design uses a single model for generating and learning from edits, future iterations could separate these into distinct ´teacher´ and ´student´ models.

SEAL was put to the test in domains such as knowledge integration and few-shot learning. Results were notable: in few-shot learning with a Llama-3.2-1B-Instruct model, SEAL improved adaptation success rates dramatically, reaching over 70 percent success compared to more conventional approaches. For knowledge integration, the Qwen2.5-7B model effectively assimilated new facts, outpacing baseline and previous reinforcement learning methods, sometimes exceeding even setups using GPT-4.1-generated data. The researchers highlighted how reinforcement learning not only boosted quantitative outcomes but also enabled the model to generate more nuanced, task-relevant self-edits. Despite the promise, challenges remain—particularly with catastrophic forgetting, computational costs, and context-aware evaluation, all of which the team discusses in their publication.

This work emerges amid a surge of global interest in self-evolving Artificial Intelligence, with parallel projects like Sakana AI´s Darwin-Gödel Machine and OpenAI´s speculation on recursive self-improvement capturing widespread attention. SEAL stands out as a concrete and experimentally validated step towards autonomous, self-improving language technologies, offering a glimpse at the ongoing transformation of the field.

84

Impact Score

Report finds California creative job losses are not driven by Artificial Intelligence

New research from Otis College of Art and Design finds California’s recent creative industry job losses stem from cost pressures and structural shifts, not direct worker displacement by generative Artificial Intelligence. The technology is changing workflows and expectations, but it is largely replacing tasks rather than entire jobs.

U.S. senators propose broader chip tool export ban for Chinese firms

A bipartisan proposal in the U.S. Senate would shift semiconductor equipment controls from specific fabs to targeted Chinese companies and their affiliates. The measure is aimed at cutting off access to advanced lithography and other wafer fabrication tools for firms such as Huawei, SMIC, YMTC, CXMT, and Hua Hong.

Trump executive order targets state Artificial Intelligence laws

Executive Order 14365 lays out a federal strategy to discourage, challenge, and potentially preempt state Artificial Intelligence laws viewed as burdensome. Employers are advised to keep complying with current state and local rules while preparing for regulatory uncertainty in 2026.

Who decides how America uses Artificial Intelligence in war

Stanford experts are divided over how the United States should govern Artificial Intelligence in defense, surveillance, and warfare. Their views converge on one point: decisions with such high stakes cannot be left to companies alone.

GPUBreach bypasses IOMMU on GDDR6-based NVIDIA GPUs

Researchers from the University of Toronto describe GPUBreach, a rowhammer attack against GDDR6-based NVIDIA GPUs that can bypass IOMMU protections. The technique enables CPU-side privilege escalation by abusing trusted GPU driver behavior on the host system.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.