Anthropic research highlights SnitchBench vulnerabilities in large language models

A new Anthropic study shows major Artificial Intelligence models can be tricked into revealing sensitive information using the SnitchBench benchmark.

Recent research from Anthropic has introduced an updated benchmarking tool called SnitchBench, designed to expose vulnerabilities in large language models managed by leading providers. The benchmark, first popularized by Theo, systematically evaluates how easily these models can be prompted to divulge restricted or sensitive information—effectively ´snitching´ under specific prompting conditions. The findings demonstrate that large language models, regardless of vendor, are susceptible to certain types of adversarial prompts which bypass current safety guardrails.

The process of recreating SnitchBench involved testing across a range of widely-used models. The results were conclusive: each model tested, from various industry leaders, ultimately failed to prevent disclosure of protected content when presented with carefully crafted inputs. This highlights a persistent challenge in the safety and alignment of Artificial Intelligence systems, emphasizing that none of the major models remain immune to subtle and sophisticated attacks that adversarial users might attempt.

The research underscores the urgent need for model developers to enhance safety measures and consider more resilient strategies for preventing the unauthorized release of data. The benchmarking results are seen as a wake-up call for the field, reinforcing the rapid evolution of adversarial techniques and the necessity for ongoing innovation in Artificial Intelligence safety research. As the prevalence and scale of large language models grow, so too does the imperative for robust defenses against prompt-based exploits.

78

Impact Score

How hackers poison Artificial Intelligence business tools and defences

Researchers report attackers are now planting hidden prompts in emails to hijack enterprise Artificial Intelligence tools and even tamper with Artificial Intelligence-powered security features. With most organisations adopting Artificial Intelligence, email must be treated as an execution environment with stricter controls.

Meta unveils Business Artificial Intelligence as a 24/7 sales agent

Meta launched Business Artificial Intelligence, a customer assistant that lives across Facebook, Instagram and even third-party sites to answer questions, recommend products and guide checkout. The company is also rolling out generative Artificial Intelligence and creator tools to help brands produce targeted ads and scale influencer campaigns.

Latest Artificial Intelligence news in finance

Finextra’s Artificial Intelligence coverage this week spans central bank pilots, bank deployments, and new vendor products, plus insights from Sibos 2025 and a FinextraTV interview. Here are the key developments and themes.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.