Enterprises fine tuning large language models for specialized tasks often face catastrophic forgetting, where new training causes models to lose previously learned abilities, forcing organizations to maintain multiple separate systems. Researchers at MIT, the Improbable artificial intelligence lab and ETH Zurich have developed a method called self distillation fine tuning that enables large language models to acquire new skills and proprietary knowledge without sacrificing prior performance. The technique leverages the in context learning abilities of modern models to approximate on policy learning without requiring reinforcement learning reward functions, offering a path toward adaptive Artificial Intelligence agents that can evolve alongside dynamic business needs.
Self distillation fine tuning addresses limitations of both reinforcement learning and supervised fine tuning. On policy learning traditionally relies on reinforcement learning with explicit reward functions, which works for domains like math and coding but fails where it is difficult or impossible to define a numerical reward, such as legal writing or meeting summarization, and struggles when the model has zero prior knowledge of a topic. Supervised fine tuning, where a model mimics a fixed dataset of expert demonstrations, is inherently off policy and often fails to generalize to out of distribution cases while suffering heavily from catastrophic forgetting. Self distillation fine tuning creates a feedback loop inside a single model by splitting it into a frozen teacher and a trainable student: the teacher receives queries plus expert demonstrations and uses in context learning to infer correct answers and reasoning, while the student sees only the query and updates its parameters to match the teacher’s output distribution, effectively turning prerecorded demonstrations into an on policy style learning signal.
In experiments using the open weight Qwen 2.5 model on science question answering, software tool use and medical reasoning, self distillation fine tuning learned new tasks more effectively than standard supervised methods. On the Science Q&A benchmark, the self distillation fine tuning model achieved 70.2% accuracy, compared to 66.2% for the standard supervised fine tuning approach. When taught the science task, the supervised fine tuning model’s performance on general questions collapsed, while the self distillation fine tuning model kept its “Previous Tasks” score steady at 64.5%, indicating preservation of earlier knowledge. In a synthetic “2025 Natural Disasters” knowledge injection test, the self distillation fine tuning model, which had internalized reasoning over the new facts, scored 98% on indirect questions that required applying that knowledge. A sequential experiment showed the method could add science, tool use and medical skills in succession without regression, suggesting that a single model could replace “model zoos” of adapters and reduce inference costs.
The approach is available as open source code and is being integrated with the Hugging Face transformer reinforcement learning library, but it carries computational tradeoffs. The self distillation fine tuning pipeline is more similar to reinforcement learning in that it requires online response generation during training, and it is approximately four times slower and requires 2.5 times more computational power (FLOPs) than standard fine tuning because the model must generate rollout answers to compare against the teacher. The method also depends on sufficiently capable base models, with current experiments indicating that architectures around 4 billion parameters, such as Qwen 3 4B, have strong enough in context learning to act as their own teachers, while earlier 3 billion parameter models struggled. The researchers expect that improving small models could eventually bring self distillation fine tuning to 1 billion parameter systems, moving the field closer to lifelong learning where models continually improve from real world user interactions and where the growing compute spent on inference can be harnessed to update and refine model behavior over time.
