OpenAI warns prompt injection is a lasting threat for Artificial Intelligence browser agents

OpenAI has rolled out new security measures for its ChatGPT Atlas browser agent while warning that prompt injection on the open web is a long-term, unsolved risk that users and developers must manage, not eliminate. The company is pairing adversarial training with a broader defense stack and practical guidelines for safer use.

OpenAI is tightening security around its ChatGPT Atlas browser agent while publicly stating that prompt injection is a structural problem that the Artificial Intelligence industry will be managing for years. Prompt injection is described as malicious instructions hidden inside content an agent reads, such as emails, documents, or web pages, with the goal of steering its actions off-task. The risk is heightened for browser agents because they can perform actions like sending emails, moving money, and editing files, which turns untrusted text into a real attack surface rather than a nuisance.

To counter this, OpenAI says it has built a large language model based automated attacker that is trained end-to-end with reinforcement learning to discover viable prompt injection attacks in realistic, multi-step scenarios. A key part of this approach is simulation, where the attacker proposes an injection, runs a counterfactual rollout, and then examines the victim agent’s reasoning and action trace to refine its strategy, with OpenAI arguing that this internal access gives it an advantage over outside attackers. The company frames its security work on Atlas as a rapid response loop, where each newly discovered class of successful attacks is used to quickly harden the system through adversarial training and system-level changes, including a new adversarially trained browser agent checkpoint already rolled out to users.

OpenAI illustrates the impact of the update with an example in which an attack was seeded via email, causing the agent to encounter hidden instructions and act incorrectly, whereas after the update the agent mode detected and flagged the prompt injection attempt. Alongside model and system defenses, OpenAI emphasizes that users can reduce risk by starting in logged-out mode, limiting sign in to only the specific sites needed, carefully reading confirmation prompts before sending messages or completing purchases, and using explicit, well-scoped prompts instead of open-ended instructions. The company argues that saying prompt injection is unlikely to be fully solved is a security mindset rather than a surrender, and that the practical goal is to make attacks harder, more expensive, and easier to detect, nudging product teams toward tighter permissions, stronger confirmations, better monitoring, and faster patch cycles so that browser based Artificial Intelligence agents like Atlas can be trusted with more tasks over time.

58

Impact Score

Researchers model early human pregnancy using organoids and embryos

Scientists have merged human embryos and blastoid models with uterine organoids on microfluidic chips to closely mimic the first moments of pregnancy in the lab, opening a new window into implantation and in vitro fertilization failure. The work could inform future diagnostics, drug screening, and long-term questions about gestation outside the body.

Artificial Intelligence PC arms race reshapes the NPU market

Qualcomm, AMD, Intel, and a looming NVIDIA entry are turning the Artificial Intelligence PC into the new standard, as neural processing units redefine performance, power efficiency, and local computing. The competition is fragmenting the old Wintel order and accelerating a shift toward on-device generative Artificial Intelligence.

Debating a post-GeForce future for Nvidia and PC gaming

Hacker News commenters argue over whether Nvidia could realistically exit consumer graphics in favor of Artificial Intelligence hardware, and what that would mean for PC gaming, hardware prices, and industry competition.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.