DeepMind Proposes CaMeL Defense Against LLM Prompt Injection

Google DeepMind introduces CaMeL, a security layer that applies traditional software security concepts to large language models, effectively blocking many prompt injection attacks in real-world agent benchmarks.

Google DeepMind researchers have introduced CaMeL, a new defense mechanism designed to protect large language models (LLMs) against prompt injection attacks originating from untrusted inputs. The CaMeL framework acts as a defense layer around LLMs, intercepting and neutralizing potentially malicious queries before they can exploit the model. In benchmark tests using AgentDojo, a security suite for autonomous agents, CaMeL was able to block 67% of prompt injection attacks, demonstrating considerable effectiveness over current solutions.

Prompt injection attacks allow adversaries to manipulate LLMs by crafting context or instructions that cause models to exfiltrate sensitive information or execute unintended actions, such as sending unauthorized emails or leaking private data. Conventional defenses rely on more Artificial Intelligence to monitor or detect malicious prompts, but attackers have repeatedly found ways to circumvent these measures, as seen in successful phishing attacks bypassing even the latest LLM security features.

Distinctively, CaMeL applies established software security principles—such as control flow integrity, access control, and information flow control—to LLM interactions. It uses a custom Python interpreter to track the origin and permissible actions associated with all data and instructions encountered by a privileged LLM, without requiring modification of the LLM itself. By leveraging the Dual LLM pattern, where one LLM handles untrusted inputs in quarantine and another privileged LLM enforces workflows and access rights, CaMeL builds a data flow graph and attaches security metadata to all variables and program data. This metadata defines what actions are authorized, ensuring that output from untrusted sources cannot be misused even if manipulated upstream.

While the approach reduces the need for Artificial Intelligence-driven security layers, making detection less probabilistic and more deterministic, it is not a silver bullet. The researchers point out limitations such as the need for users to define security policies themselves and the risk of user fatigue from manual approval of sensitive tasks. Nevertheless, CaMeL´s results highlight the merit of augmenting LLM security with well-understood software security methodologies, providing a promising avenue for reducing systemic risk in production LLM deployments.

77

Impact Score

French Artificial Intelligence startup Mistral unveils Mistral 3 open-source models

French Artificial Intelligence startup Mistral unveiled Mistral 3, a next-generation family of open-source models that includes small dense models 14B, 8B, and 3B and a larger sparse mixture-of-experts called Mistral Large 3. The company said the release represents its most capable model to date and noted Microsoft backing.

Artificial Intelligence newsroom: Anthropic’s new model redefines coding

Anthropic released Claude Opus 4.5, a new large language model that scored 80% on the SWE verified benchmark and took the no. 1 spot on the ARC AGI test. Enterprise Artificial Intelligence adoption is accelerating, with full implementation up 282%, while the U.S. Genesis Mission opens petabytes of lab data to foundation model teams.

Microsoft warns: Windows 11 agentic features may hallucinate

After installing Windows 11 Build 26220.7262, users will see an optional toggle for Experimental agentic features under Settings > System > Artificial Intelligence Components. Microsoft cautions that as these features roll out, Artificial Intelligence models can still hallucinate and that new security risks tied to autonomous agents are emerging.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.