Chennai, 26th November 2025. VibeStudio, an agentic coding suite incubated by Immersive Technology and Entrepreneurship Labs (ITEL), announced a targeted compression of the open-source MiniMax M2 model that the company says achieves world-best LLM performance on a single GPU. The work was led by a small Indian team under Arjun Reddy and supported by ITEL chair Prof. Ashok Jhunjhunwala. The project focused on reducing the GPU, Memory, and energy costs associated with deploying Large Language Models for real coding and full-repo reasoning in colleges and enterprises, citing the cost and scale of hardware such as H200 GPUs as a barrier.
VibeStudio describes the new method as THRIFT: Targeted Hierarchical Reduction for Inference and Fine-Tuning. According to the announcement, THRIFT audits the model layer by layer to identify redundant experts, silent activation routes, and dead parameters, and then applies calibrated staged pruning with teacher-guided fine-tuning after each stage. The reported outcome is a 55% size reduction of MiniMax M2 while retaining 80% of the original model’s reasoning strength and coding precision, and in many cases delivering faster responsiveness. VibeStudio says it has released the pruned M2 on HuggingFace as open-source and that the release has crossed 150,000+ downloads to date.
Beyond the open release, VibeStudio retains two private foundational models for enterprise deployments: an 8B Dense Model optimised for quantised local use on mainstream hardware and a 32B A3B MoE Model built for secure, high-speed, on-premises reasoning. Those models remain closed and are exclusive to enterprise partners. VibeStudio positions its Agentic IDE and THRIFT-compressed models as a path to deliver powerful, affordable Artificial Intelligence-enabled coding tools to a wide range of users, from large companies to first-year engineering students on budget laptops, emphasizing engineering efficiency over continually scaling model size.
