Health chatbots spread faster than independent testing

March 31, 2026

Microsoft, Amazon, OpenAI, and Anthropic are expanding consumer health chatbots as demand rises. Researchers say the tools may help fill care gaps, but independent evaluation still lags behind public rollout.

Microsoft has launched Copilot Health, Amazon has widened access to Health AI, OpenAI released ChatGPT Health in January, and Anthropic’s Claude can access user health records with permission. Consumer health chatbots are becoming a clear industry trend, driven by both rapid advances in generative Artificial Intelligence and strong public demand for easier access to health advice. Researchers and clinicians broadly agree that such tools could be useful in a strained health-care system, particularly for people who face barriers to getting timely medical guidance.

Companies argue that newer models are better at answering health questions safely and helpfully. Microsoft says it receives 50 million health questions each day, and health is the most popular discussion topic on the Copilot mobile app. Supporters of these systems see a potential role in triage, where a chatbot might help users decide whether they need urgent care or whether a condition can be managed at home. That could reduce pressure on clinics and emergency rooms while helping some patients seek care sooner.

Researchers remain concerned that public deployment is moving ahead without enough independent scrutiny. A recent Mount Sinai study found that ChatGPT Health sometimes recommends too much care for mild conditions and fails to identify emergencies, raising questions about safety and external oversight. Although products such as ChatGPT Health, Copilot Health, and Amazon’s Health AI include warnings that they are not intended for diagnosis or treatment, experts say many users are likely to rely on them for exactly those purposes. That makes failures in triage, diagnosis, or treatment advice especially consequential.

OpenAI has introduced HealthBench to measure performance in realistic health-related conversations, and the company reported strong results for GPT-5, which powers both ChatGPT Health and Copilot Health. Even so, experts say benchmarks based on model-generated scenarios do not fully capture what happens when real people describe symptoms imperfectly or misunderstand responses. Bean and colleagues found that a non-expert user who is given the scenario and asked to determine the condition with LLM assistance might figure it out only a third of the time. OpenAI says newer systems are better at asking for missing context, but it has also reported that GPT-5.4 is worse at seeking context than GPT-5.2.

Google recently published a human study of its AMIE medical chatbot, which is not yet public, finding that AMIE’s diagnoses were just as accurate as physicians’, and that none of the conversations raised major safety concerns for researchers. Even with those results, Google says more work is needed on equity, fairness, and safety before real-world use for diagnosis or treatment. Across the field, the central debate is shifting toward trusted third-party evaluation. Researchers say no single benchmark will settle the issue, but wider external testing may be the best way to determine whether these tools genuinely improve care or whether their risks still outweigh their benefits.

Source

67

Impact Score

Latest News

U.S. and China revisit Artificial Intelligence emergency talks

May 14, 2026

Washington and Beijing are exploring renewed talks on an emergency communication channel for Artificial Intelligence as fears grow over the capabilities of Anthropic’s Mythos model. The shift reflects rising concern in both capitals that competitive pressure is outpacing safeguards.

Artificial Intelligence divides employers as hiring and headcount shift

May 14, 2026

U.S. hiring beat expectations in April, but employers remain split on whether Artificial Intelligence should drive layoffs, productivity gains, or internal redeployment. At the same time, candidate use of Artificial Intelligence is outpacing employer adoption in hiring, adding new pressure to screening and entry-level recruiting.

Draft federal Artificial Intelligence contract rules raise IP and reporting concerns

May 13, 2026

A draft GSA contract clause for Artificial Intelligence systems could reshape intellectual property ownership, broaden contractor liability, and require extensive disclosure of tools used in federal work. Business groups are urging narrower definitions and stronger protection for pre-existing proprietary technology.

What businesses need to know about the EU cyber resilience act

May 13, 2026

The EU cyber resilience act is turning product cybersecurity into a legal requirement for companies that sell digital products into the European Union. A key compliance milestone arrives in September 2026, well before the full regulation takes effect in 2027.

Claude Mythos and cyber insurance’s next inflection point

May 13, 2026

Claude Mythos is being treated by governments and regulators as a potential systemic cyber risk with implications for financial stability and insurance markets. Its emergence is intensifying pressure on insurers to clarify whether Artificial Intelligence-enabled cyber losses are covered, excluded, or require new stand-alone products.

Health chatbots spread faster than independent testing

67

Impact Score

Latest News

U.S. and China revisit Artificial Intelligence emergency talks

Artificial Intelligence divides employers as hiring and headcount shift

Draft federal Artificial Intelligence contract rules raise IP and reporting concerns

What businesses need to know about the EU cyber resilience act

Claude Mythos and cyber insurance’s next inflection point

Contact Us