LM Studio Boosts Local LLM Performance on NVIDIA GeForce RTX GPUs With CUDA 12.8

LM Studio´s latest update harnesses CUDA 12.8 and NVIDIA GeForce RTX GPUs for faster, more private local deployment of large language models in diverse Artificial Intelligence workflows.

As demand grows for on-device large language model (LLM) applications, LM Studio is helping developers and enthusiasts unlock high-performance inference on their own PCs. By leveraging NVIDIA GeForce RTX GPUs, LM Studio offers the ability to run LLMs entirely offline, giving users improved data privacy, greater control, and enhanced flexibility without depending on cloud infrastructure. The software is built atop the robust llama.cpp runtime, supporting both interactive chat interfaces and OpenAI-compatible APIs for seamless integration with custom tools and workflows.

The latest release, LM Studio 0.3.15, introduces CUDA 12.8 support, significantly accelerating both model loading and response times on RTX GPUs. This update also adds developer-oriented enhancements such as greater control over external tool usage through the ´tool_choice´ parameter and an improved system prompt editor for handling complex prompt structures. These upgrades boost LM Studio´s usability and performance across a wide range of RTX AI PCs, delivering faster interactions and easier integration for various Artificial Intelligence-driven tasks.

LM Studio´s flexibility enables use cases from casual experimentation to production-level deployment. The application integrates with popular desktop tools like Obsidian via plug-ins, facilitating content generation, research summarization, and note querying with locally hosted LLMs. It supports a wide array of open models—including Gemma, Llama 3, Mistral, and Orca—and various quantization formats. The 0.3.15 release brings further optimizations through CUDA graph enablement and flash attention CUDA kernels, boosting throughput by up to 35% and reducing CPU overhead. Compatibility now extends from GeForce RTX 20 Series GPUs to NVIDIA Blackwell-class hardware.

Getting started with LM Studio is straightforward: users can download the application for Windows, macOS, or Linux, select the appropriate CUDA 12 runtime, and use interface controls to maximize GPU utilization and enable flash attention. The platform continues to evolve with active community and NVIDIA-backed development on the llama.cpp backend, promising ongoing improvements for local LLM deployment. This positions LM Studio as a leading, accessible tool for high-performance, privacy-centric Artificial Intelligence on RTX-powered systems.

62

Impact Score

IBM and AMD partner on quantum-centric supercomputing

IBM and AMD announced plans to develop quantum-centric supercomputing architectures that combine quantum computers with high-performance computing to create scalable, open-source platforms. The collaboration leverages IBM´s work on quantum computers and software and AMD´s expertise in high-performance computing and Artificial Intelligence accelerators.

Qualcomm launches Dragonwing Q-6690 with integrated RFID and Artificial Intelligence

Qualcomm announced the Dragonwing Q-6690, billed as the world’s first enterprise mobile processor with fully integrated UHF RFID and built-in 5G, Wi-Fi 7, Bluetooth 6.0, ultra-wideband and Artificial Intelligence capabilities. The platform is aimed at rugged handhelds, point-of-sale systems and smart kiosks and offers software-configurable feature packs that can be upgraded over the air.

Recent books from the MIT community

A roundup of new titles from the MIT community, including Empire of Artificial Intelligence, a critical look at Sam Altman’s OpenAI, and Data, Systems, and Society, a textbook on harnessing Artificial Intelligence for societal good.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.