Health chatbots spread faster than independent testing

Microsoft, Amazon, OpenAI, and Anthropic are expanding consumer health chatbots as demand rises. Researchers say the tools may help fill care gaps, but independent evaluation still lags behind public rollout.

Microsoft has launched Copilot Health, Amazon has widened access to Health AI, OpenAI released ChatGPT Health in January, and Anthropic’s Claude can access user health records with permission. Consumer health chatbots are becoming a clear industry trend, driven by both rapid advances in generative Artificial Intelligence and strong public demand for easier access to health advice. Researchers and clinicians broadly agree that such tools could be useful in a strained health-care system, particularly for people who face barriers to getting timely medical guidance.

Companies argue that newer models are better at answering health questions safely and helpfully. Microsoft says it receives 50 million health questions each day, and health is the most popular discussion topic on the Copilot mobile app. Supporters of these systems see a potential role in triage, where a chatbot might help users decide whether they need urgent care or whether a condition can be managed at home. That could reduce pressure on clinics and emergency rooms while helping some patients seek care sooner.

Researchers remain concerned that public deployment is moving ahead without enough independent scrutiny. A recent Mount Sinai study found that ChatGPT Health sometimes recommends too much care for mild conditions and fails to identify emergencies, raising questions about safety and external oversight. Although products such as ChatGPT Health, Copilot Health, and Amazon’s Health AI include warnings that they are not intended for diagnosis or treatment, experts say many users are likely to rely on them for exactly those purposes. That makes failures in triage, diagnosis, or treatment advice especially consequential.

OpenAI has introduced HealthBench to measure performance in realistic health-related conversations, and the company reported strong results for GPT-5, which powers both ChatGPT Health and Copilot Health. Even so, experts say benchmarks based on model-generated scenarios do not fully capture what happens when real people describe symptoms imperfectly or misunderstand responses. Bean and colleagues found that a non-expert user who is given the scenario and asked to determine the condition with LLM assistance might figure it out only a third of the time. OpenAI says newer systems are better at asking for missing context, but it has also reported that GPT-5.4 is worse at seeking context than GPT-5.2.

Google recently published a human study of its AMIE medical chatbot, which is not yet public, finding that AMIE’s diagnoses were just as accurate as physicians’, and that none of the conversations raised major safety concerns for researchers. Even with those results, Google says more work is needed on equity, fairness, and safety before real-world use for diagnosis or treatment. Across the field, the central debate is shifting toward trusted third-party evaluation. Researchers say no single benchmark will settle the issue, but wider external testing may be the best way to determine whether these tools genuinely improve care or whether their risks still outweigh their benefits.

67

Impact Score

U.S. and China revisit Artificial Intelligence emergency talks

Washington and Beijing are exploring renewed talks on an emergency communication channel for Artificial Intelligence as fears grow over the capabilities of Anthropic’s Mythos model. The shift reflects rising concern in both capitals that competitive pressure is outpacing safeguards.

Artificial Intelligence divides employers as hiring and headcount shift

U.S. hiring beat expectations in April, but employers remain split on whether Artificial Intelligence should drive layoffs, productivity gains, or internal redeployment. At the same time, candidate use of Artificial Intelligence is outpacing employer adoption in hiring, adding new pressure to screening and entry-level recruiting.

What businesses need to know about the EU cyber resilience act

The EU cyber resilience act is turning product cybersecurity into a legal requirement for companies that sell digital products into the European Union. A key compliance milestone arrives in September 2026, well before the full regulation takes effect in 2027.

Claude Mythos and cyber insurance’s next inflection point

Claude Mythos is being treated by governments and regulators as a potential systemic cyber risk with implications for financial stability and insurance markets. Its emergence is intensifying pressure on insurers to clarify whether Artificial Intelligence-enabled cyber losses are covered, excluded, or require new stand-alone products.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.