Researchers from MIT and collaborating institutions have introduced a method to accelerate the training of reasoning large language models by exploiting idle computing time in multi-processor setups. Reasoning models tackle complex tasks by decomposing problems into smaller steps but are expensive to train because reinforcement learning requires generating many candidate answers, a process that often leaves some processors idle while others handle longer responses. By turning this idle time into productive work, the new technique significantly increases efficiency without adding computational overhead.
The approach centers on automatically training a smaller, faster drafter model that predicts the outputs of the larger reasoning model, whose role is to verify these predictions. Because the larger model can verify many drafter guesses at once rather than generating each output sequentially, the process reduces the workload on the main model and accelerates training. The system is designed to activate the drafter only when some processors are idle, which allows the method to leverage computational resources that would otherwise be wasted while maintaining training accuracy.
Conventional speculative decoding typically relies on a static drafter model, which quickly becomes outdated as reinforcement learning updates the main model thousands of times. To address this, the team developed a flexible system called “Taming the Long Tail,” or TLT, which includes an adaptive drafter trainer that continuously updates the smaller model during idle periods and an adaptive rollout engine that chooses the optimal speculative decoding strategy for each new batch of inputs. TLT reuses components from the existing training workflow, keeps the drafter lightweight for rapid updates, and immediately shifts idle processors into drafter training using current rollout data. Tested across multiple reasoning large language models and real-world datasets, the system accelerated training between 70 and 210 percent while preserving accuracy, and the resulting small drafter model can also be repurposed for efficient deployment as an additional benefit.
