ITEL’s VibeStudio achieves a big Artificial Intelligence breakthrough: world-best LLM performance on just one GPU

VibeStudio, incubated by Immersive Technology and Entrepreneurship Labs (ITEL), reports a 55% pruning of the MiniMax M2 model using THRIFT, delivering near state-of-the-art LLM reasoning and coding on far smaller hardware. The team says the pruned model is open-source on HuggingFace and the company maintains two private models for enterprise use.

Chennai, 26th November 2025. VibeStudio, an agentic coding suite incubated by Immersive Technology and Entrepreneurship Labs (ITEL), announced a targeted compression of the open-source MiniMax M2 model that the company says achieves world-best LLM performance on a single GPU. The work was led by a small Indian team under Arjun Reddy and supported by ITEL chair Prof. Ashok Jhunjhunwala. The project focused on reducing the GPU, Memory, and energy costs associated with deploying Large Language Models for real coding and full-repo reasoning in colleges and enterprises, citing the cost and scale of hardware such as H200 GPUs as a barrier.

VibeStudio describes the new method as THRIFT: Targeted Hierarchical Reduction for Inference and Fine-Tuning. According to the announcement, THRIFT audits the model layer by layer to identify redundant experts, silent activation routes, and dead parameters, and then applies calibrated staged pruning with teacher-guided fine-tuning after each stage. The reported outcome is a 55% size reduction of MiniMax M2 while retaining 80% of the original model’s reasoning strength and coding precision, and in many cases delivering faster responsiveness. VibeStudio says it has released the pruned M2 on HuggingFace as open-source and that the release has crossed 150,000+ downloads to date.

Beyond the open release, VibeStudio retains two private foundational models for enterprise deployments: an 8B Dense Model optimised for quantised local use on mainstream hardware and a 32B A3B MoE Model built for secure, high-speed, on-premises reasoning. Those models remain closed and are exclusive to enterprise partners. VibeStudio positions its Agentic IDE and THRIFT-compressed models as a path to deliver powerful, affordable Artificial Intelligence-enabled coding tools to a wide range of users, from large companies to first-year engineering students on budget laptops, emphasizing engineering efficiency over continually scaling model size.

58

Impact Score

Google models on Vertex Artificial Intelligence

A concise guide to Google generative Artificial Intelligence models on Vertex Artificial Intelligence, outlining featured Gemini releases, Gemma open models, image and video models, embeddings, and MedLM variants.

Samsung starts sampling 3 GB GDDR7 running at 36 Gbps

Samsung has begun sampling its fastest-ever GDDR7 memory at 36 Gbps in 24 Gb dies that translate to 3 GB per chip, and it is also mass producing 28.0 Gbps 3 GB modules reportedly aimed at a mid-cycle NVIDIA refresh.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.