Adaptive training method boosts reasoning large language model efficiency

Researchers have developed an adaptive training system that uses idle processors to train a smaller helper model on the fly, doubling reasoning large language model training speed without sacrificing accuracy. The method aims to cut costs and energy use for advanced applications such as financial forecasting and power grid risk detection.

Researchers from MIT and collaborating institutions have introduced a method to accelerate the training of reasoning large language models by exploiting idle computing time in multi-processor setups. Reasoning models tackle complex tasks by decomposing problems into smaller steps but are expensive to train because reinforcement learning requires generating many candidate answers, a process that often leaves some processors idle while others handle longer responses. By turning this idle time into productive work, the new technique significantly increases efficiency without adding computational overhead.

The approach centers on automatically training a smaller, faster drafter model that predicts the outputs of the larger reasoning model, whose role is to verify these predictions. Because the larger model can verify many drafter guesses at once rather than generating each output sequentially, the process reduces the workload on the main model and accelerates training. The system is designed to activate the drafter only when some processors are idle, which allows the method to leverage computational resources that would otherwise be wasted while maintaining training accuracy.

Conventional speculative decoding typically relies on a static drafter model, which quickly becomes outdated as reinforcement learning updates the main model thousands of times. To address this, the team developed a flexible system called “Taming the Long Tail,” or TLT, which includes an adaptive drafter trainer that continuously updates the smaller model during idle periods and an adaptive rollout engine that chooses the optimal speculative decoding strategy for each new batch of inputs. TLT reuses components from the existing training workflow, keeps the drafter lightweight for rapid updates, and immediately shifts idle processors into drafter training using current rollout data. Tested across multiple reasoning large language models and real-world datasets, the system accelerated training between 70 and 210 percent while preserving accuracy, and the resulting small drafter model can also be repurposed for efficient deployment as an additional benefit.

68

Impact Score

How to run MiniMax M2.5 locally with Unsloth GGUF

MiniMax-M2.5 is a new open large language model optimized for coding, tool use, search, and office tasks, and Unsloth provides quantized GGUF builds and usage recipes for running it locally. The guide focuses on memory requirements, recommended decoding parameters, and deployment via llama.cpp and llama-server with an OpenAI-compatible interface.

Y Combinator backs new wave of computer vision startups in 2026

Y Combinator’s 2026 computer vision cohort spans infrastructure, developer tools, and industry-specific applications from retail security to aquaculture and healthcare. Startups are increasingly pairing computer vision with large vision language models and foundation models to tackle real-time video, automation, and domain-specific analysis.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.