Researchers from Stanford, MIT CSAIL, Carnegie Mellon, ITU Copenhagen, NVIDIA and Elloe Artificial Intelligence Labs examined 847 autonomous agent deployments drawn from healthcare, finance, customer service and code-generation. The study found that 91% were vulnerable to subtle but dangerous tool-chaining attacks, where seemingly innocuous calls can combine to cause serious problems that reasoning models miss.
The same study found that 89.4% of agents showed drift relative to their goals after about 30 steps in their process, and 94% of agents with some form of memory-augmentation were vulnerable to poisoning attacks. The paper also indicated that agents are in many ways much more vulnerable than pure stateless large language models, based on a taxonomy developed by the researchers.
The findings reinforce similar concerns documented in February by a team of AWS and Berkeley researchers, who reported related vulnerabilities in autonomous agents. Owen Sakawa, identified as the newer paper’s first author, said the OpenClaw / Moltbook incident was the first real-world empirical validation of the agentic threat model at scale, with 770,000 live agents simultaneously compromised via a single database exploit, each with privileged access to their owner’s machine, email, and files. The incident was presented as evidence that these risks are no longer hypothetical.
