Artificial intelligence agents that can act as personal assistants are rapidly moving from concept to reality, but their growing power brings significant security risks. OpenClaw, a tool created by independent software engineer Peter Steinberger, lets users strap existing large language models into a kind of “mecha suit,” granting them persistent memory and the ability to run ongoing tasks via messaging apps like WhatsApp. Unlike more constrained offerings from major artificial intelligence companies, OpenClaw agents are designed to run 24-7 and can manage inboxes, plan trips, and even write code or spin up new applications. To do so, they often require deep access to users’ emails, credit card information, and local files, which has alarmed security experts and even prompted a public warning from the Chinese government about its vulnerabilities.
Some of the most immediate dangers involve basic operational mistakes and traditional hacking. One user’s Google Antigravity coding agent reportedly wiped his entire hard drive, illustrating how easily automated tools can cause catastrophic damage when given broad permissions. Security researchers have also demonstrated multiple ways attackers could compromise OpenClaw instances using conventional techniques to exfiltrate sensitive data or run malicious code. Users can partially mitigate these problems by isolating agents on separate machines or in the cloud, and many weaknesses could be addressed with established security practices. However, specialists are especially concerned about prompt injection, an insidious form of large language model hijacking in which malicious text or images embedded in websites or emails are misinterpreted as instructions, allowing attackers to redirect an artificial intelligence assistant that holds private user information.
Prompt injection has not yet led to publicly known disasters, but the proliferation of OpenClaw agents potentially makes this attack vector more enticing to cybercriminals. The core issue is that large language models do not inherently distinguish user instructions from data such as web pages or emails, treating everything as text and making them easy to trick. Researchers are pursuing three broad defense strategies, each with trade-offs. One is to train models during post-training to ignore known forms of prompt injection, though pushing too hard risks rejecting legitimate requests and cannot fully overcome the models’ inherent randomness. Another approach uses detector models to screen inputs for injected prompts before they reach the main assistant, but recent studies show that even the best detectors miss entire categories of attacks. A third strategy focuses on output policies that constrain what agents are allowed to do, such as limiting email recipients, which can block harmful actions but also sharply reduces utility and is difficult to define precisely.
The broader artificial intelligence ecosystem is still wrestling with when these agents will be secure enough for mainstream deployment. Dawn Song of UC Berkeley, whose startup Virtue AI develops an agent security platform, believes safe deployment is already possible with the right safeguards, while Duke University’s Neil Gong argues that the field is not there yet. Even if full protection against prompt injection remains elusive, partial mitigations can meaningfully reduce risk, and Steinberger has recently brought a security specialist onto the OpenClaw project to strengthen its defenses. Many enthusiasts, such as OpenClaw maintainer George Pickett, continue to embrace the tool while taking basic precautions like running it in the cloud and locking down access, although some admit they have not implemented specific protections against prompt injection and are effectively betting they will not be the first to be hacked.
