Toolformer lets language models teach themselves to use external tools

Meta researchers introduce Toolformer, a language model training method that enables self-supervised learning of when and how to call external tools via simple APIs, such as calculators and search engines, without sacrificing core language abilities.

Large language models display strong few-shot and instruction-following abilities but still perform poorly on basic operations like arithmetic and factual lookup, where smaller specialized systems excel. Toolformer addresses this gap by enabling a language model to learn how to call external tools through simple APIs, combining general-purpose reasoning with the precision of dedicated components. The approach focuses on making the model decide autonomously which tools to use and how to integrate their outputs while preserving its underlying language modeling capabilities.

Toolformer is trained to determine which APIs to call, when to call them, what arguments to provide, and how to feed the returned results back into subsequent token prediction. Training proceeds in a self-supervised manner with only a handful of demonstrations required for each API, avoiding the need for large bespoke labeled datasets. The model effectively learns a policy over tool usage as part of standard next-token prediction, so tool calls become a natural extension of its text generation process instead of a separate control mechanism.

The researchers incorporate a diverse set of tools into Toolformer, including a calculator, a question and answer system, a search engine, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks and is often competitive with much larger models, while maintaining its core language modeling performance. The work suggests that carefully integrating external tools via self-supervised learning can allow language models to overcome persistent weaknesses in areas such as computation and factual retrieval, without scaling model size or sacrificing fluency.

65

Impact Score

European Union delays key Artificial Intelligence Act obligations

European Union lawmakers have agreed to revise the Artificial Intelligence Act, delaying major high-risk compliance obligations and easing some overlapping requirements. The changes give businesses more time to prepare while preserving the law’s core framework for high-risk systems and transparency rules.

HMRC signs £175m Quantexa deal for fraud detection

HM Revenue and Customs has signed a £175 million, 10-year agreement with Quantexa to unify fragmented data and strengthen fraud detection. The deployment is designed to automate routine work while keeping decisions transparent, auditable and subject to human approval.

Us supercomputers test new Artificial Intelligence chip suppliers

Sandia National Laboratories is evaluating chips from Israeli startup NextSilicon as major chipmakers shift their roadmaps toward Artificial Intelligence. The move reflects growing concern that mainstream processors are deprioritizing the scientific computing features government labs still need.

EU Artificial Intelligence Act amendments delay some deadlines and add new bans

A provisional Digital Omnibus on Artificial Intelligence would push back several EU Artificial Intelligence Act deadlines, refine how the law interacts with sector rules, and introduce new prohibited practices. The package also expands limited bias-testing allowances and strengthens centralized oversight for some high-impact systems.

Qwen 3.5 raises concerns about censorship embedded in model weights

A technical analysis of Alibaba Cloud’s Qwen 3.5 points to political censorship circuits embedded directly in the model’s learned weights. The findings highlight operational, compliance, and product risks for startups building on third-party Artificial Intelligence models.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.