Fine-tuning vs. In-Context Learning: New Research Reveals Best Practices for LLM Customization

A study by Google DeepMind and Stanford sheds light on how fine-tuning and in-context learning each impact large language model generalization, guiding developers seeking maximum value from enterprise data with Artificial Intelligence.

Researchers from Google DeepMind and Stanford University have conducted a comprehensive study comparing two prominent methods for adapting large language models (LLMs) to downstream tasks: fine-tuning and in-context learning. Fine-tuning involves retraining a pre-trained LLM on a specialized data subset to alter its internal parameters, while in-context learning (ICL) uses tailored examples within the prompt to guide the model’s outputs without modifying its underlying structure. To ensure rigorous testing, the researchers used synthetic datasets with complex relationships and replaced familiar terms with nonsense words, ruling out the influence of prior knowledge learned during pre-training.

The study subjected LLMs to a series of logical and deductive challenges involving tasks such as relationship reversals and syllogisms, using both fine-tuning and ICL strategies. Experimental results revealed that models relying on ICL displayed superior generalization to novel tasks compared to their fine-tuned counterparts. However, ICL is more computationally expensive at inference time, as it requires feeding large context prompts for every model use. In contrast, standard fine-tuning is less flexible on unfamiliar data, but it does not incur repeated inference-time costs.

To bridge the gap between generalization and efficiency, the research team introduced an innovative hybrid approach dubbed ´augmented fine-tuning.´ This method enriches the fine-tuning dataset by integrating new, inferred examples generated using the LLM’s own ICL capabilities, employing both local strategies (individual fact manipulation) and global strategies (holistic data linkage). When these augmented datasets were used for fine-tuning, the resulting models outperformed those trained with standard fine-tuning or ICL alone. The findings suggest that this hybrid technique delivers both broader generalization and greater cost-effectiveness for enterprise deployment. The researchers caution that augmented fine-tuning introduces its own upfront computational overhead, but recommend its consideration wherever standard fine-tuning falls short. Overall, these insights provide actionable guidance for enterprises seeking to reliably adapt LLMs to domain-specific requirements using Artificial Intelligence.

69

Impact Score

IBM and AMD partner on quantum-centric supercomputing

IBM and AMD announced plans to develop quantum-centric supercomputing architectures that combine quantum computers with high-performance computing to create scalable, open-source platforms. The collaboration leverages IBM´s work on quantum computers and software and AMD´s expertise in high-performance computing and Artificial Intelligence accelerators.

Qualcomm launches Dragonwing Q-6690 with integrated RFID and Artificial Intelligence

Qualcomm announced the Dragonwing Q-6690, billed as the world’s first enterprise mobile processor with fully integrated UHF RFID and built-in 5G, Wi-Fi 7, Bluetooth 6.0, ultra-wideband and Artificial Intelligence capabilities. The platform is aimed at rugged handhelds, point-of-sale systems and smart kiosks and offers software-configurable feature packs that can be upgraded over the air.

Recent books from the MIT community

A roundup of new titles from the MIT community, including Empire of Artificial Intelligence, a critical look at Sam Altman’s OpenAI, and Data, Systems, and Society, a textbook on harnessing Artificial Intelligence for societal good.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.