Adaptive training method boosts reasoning large language model efficiency

Researchers have developed an adaptive training system that uses idle processors to train a smaller helper model on the fly, doubling reasoning large language model training speed without sacrificing accuracy. The method aims to cut costs and energy use for advanced applications such as financial forecasting and power grid risk detection.

Researchers from MIT and collaborating institutions have introduced a method to accelerate the training of reasoning large language models by exploiting idle computing time in multi-processor setups. Reasoning models tackle complex tasks by decomposing problems into smaller steps but are expensive to train because reinforcement learning requires generating many candidate answers, a process that often leaves some processors idle while others handle longer responses. By turning this idle time into productive work, the new technique significantly increases efficiency without adding computational overhead.

The approach centers on automatically training a smaller, faster drafter model that predicts the outputs of the larger reasoning model, whose role is to verify these predictions. Because the larger model can verify many drafter guesses at once rather than generating each output sequentially, the process reduces the workload on the main model and accelerates training. The system is designed to activate the drafter only when some processors are idle, which allows the method to leverage computational resources that would otherwise be wasted while maintaining training accuracy.

Conventional speculative decoding typically relies on a static drafter model, which quickly becomes outdated as reinforcement learning updates the main model thousands of times. To address this, the team developed a flexible system called “Taming the Long Tail,” or TLT, which includes an adaptive drafter trainer that continuously updates the smaller model during idle periods and an adaptive rollout engine that chooses the optimal speculative decoding strategy for each new batch of inputs. TLT reuses components from the existing training workflow, keeps the drafter lightweight for rapid updates, and immediately shifts idle processors into drafter training using current rollout data. Tested across multiple reasoning large language models and real-world datasets, the system accelerated training between 70 and 210 percent while preserving accuracy, and the resulting small drafter model can also be repurposed for efficient deployment as an additional benefit.

68

Impact Score

Anthropic attack exposes Claude Fable 5 jailbreak risks

A coordinated jailbreak against Claude Fable 5 bypassed Anthropic’s safety filters and produced prohibited outputs, including drug chemistry, cyberattack code and psychological manipulation techniques. The incident underscores why companies integrating Artificial Intelligence models should not treat vendor safeguards as a complete security boundary.

Brain implant helps ALS patient speak and work independently

Casey Harrell, who has ALS and is paralyzed, has become a heavy home user of a brain-computer interface that decodes attempted speech. The system now helps him communicate, browse the web, send messages, and continue working with less day-to-day support from researchers.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.