Google DeepMind researchers have introduced CaMeL, a new defense mechanism designed to protect large language models (LLMs) against prompt injection attacks originating from untrusted inputs. The CaMeL framework acts as a defense layer around LLMs, intercepting and neutralizing potentially malicious queries before they can exploit the model. In benchmark tests using AgentDojo, a security suite for autonomous agents, CaMeL was able to block 67% of prompt injection attacks, demonstrating considerable effectiveness over current solutions.
Prompt injection attacks allow adversaries to manipulate LLMs by crafting context or instructions that cause models to exfiltrate sensitive information or execute unintended actions, such as sending unauthorized emails or leaking private data. Conventional defenses rely on more Artificial Intelligence to monitor or detect malicious prompts, but attackers have repeatedly found ways to circumvent these measures, as seen in successful phishing attacks bypassing even the latest LLM security features.
Distinctively, CaMeL applies established software security principles—such as control flow integrity, access control, and information flow control—to LLM interactions. It uses a custom Python interpreter to track the origin and permissible actions associated with all data and instructions encountered by a privileged LLM, without requiring modification of the LLM itself. By leveraging the Dual LLM pattern, where one LLM handles untrusted inputs in quarantine and another privileged LLM enforces workflows and access rights, CaMeL builds a data flow graph and attaches security metadata to all variables and program data. This metadata defines what actions are authorized, ensuring that output from untrusted sources cannot be misused even if manipulated upstream.
While the approach reduces the need for Artificial Intelligence-driven security layers, making detection less probabilistic and more deterministic, it is not a silver bullet. The researchers point out limitations such as the need for users to define security policies themselves and the risk of user fatigue from manual approval of sensitive tasks. Nevertheless, CaMeL´s results highlight the merit of augmenting LLM security with well-understood software security methodologies, providing a promising avenue for reducing systemic risk in production LLM deployments.