Health chatbots spread faster than independent testing

Microsoft, Amazon, OpenAI, and Anthropic are expanding consumer health chatbots as demand rises. Researchers say the tools may help fill care gaps, but independent evaluation still lags behind public rollout.

Microsoft has launched Copilot Health, Amazon has widened access to Health AI, OpenAI released ChatGPT Health in January, and Anthropic’s Claude can access user health records with permission. Consumer health chatbots are becoming a clear industry trend, driven by both rapid advances in generative Artificial Intelligence and strong public demand for easier access to health advice. Researchers and clinicians broadly agree that such tools could be useful in a strained health-care system, particularly for people who face barriers to getting timely medical guidance.

Companies argue that newer models are better at answering health questions safely and helpfully. Microsoft says it receives 50 million health questions each day, and health is the most popular discussion topic on the Copilot mobile app. Supporters of these systems see a potential role in triage, where a chatbot might help users decide whether they need urgent care or whether a condition can be managed at home. That could reduce pressure on clinics and emergency rooms while helping some patients seek care sooner.

Researchers remain concerned that public deployment is moving ahead without enough independent scrutiny. A recent Mount Sinai study found that ChatGPT Health sometimes recommends too much care for mild conditions and fails to identify emergencies, raising questions about safety and external oversight. Although products such as ChatGPT Health, Copilot Health, and Amazon’s Health AI include warnings that they are not intended for diagnosis or treatment, experts say many users are likely to rely on them for exactly those purposes. That makes failures in triage, diagnosis, or treatment advice especially consequential.

OpenAI has introduced HealthBench to measure performance in realistic health-related conversations, and the company reported strong results for GPT-5, which powers both ChatGPT Health and Copilot Health. Even so, experts say benchmarks based on model-generated scenarios do not fully capture what happens when real people describe symptoms imperfectly or misunderstand responses. Bean and colleagues found that a non-expert user who is given the scenario and asked to determine the condition with LLM assistance might figure it out only a third of the time. OpenAI says newer systems are better at asking for missing context, but it has also reported that GPT-5.4 is worse at seeking context than GPT-5.2.

Google recently published a human study of its AMIE medical chatbot, which is not yet public, finding that AMIE’s diagnoses were just as accurate as physicians’, and that none of the conversations raised major safety concerns for researchers. Even with those results, Google says more work is needed on equity, fairness, and safety before real-world use for diagnosis or treatment. Across the field, the central debate is shifting toward trusted third-party evaluation. Researchers say no single benchmark will settle the issue, but wider external testing may be the best way to determine whether these tools genuinely improve care or whether their risks still outweigh their benefits.

67

Impact Score

Anumana wins FDA clearance for pulmonary hypertension ECG Artificial Intelligence tool

Anumana has received FDA 510(k) clearance for an Artificial Intelligence-enabled pulmonary hypertension algorithm designed for use with standard 12-lead electrocardiograms. The company says the software can help clinicians spot early signs of disease within existing workflows and without moving patient data outside the health system environment.

Anu Bradford on tech sovereignty and regulatory fragmentation

Anu Bradford argues that Europe is wavering in its role as the world’s digital rule-setter just as governments everywhere move toward more state control over technology. Global companies are being pushed to treat geopolitical risk, data sovereignty, and Artificial Intelligence governance as core strategic issues.

Mistral launches text-to-speech model

Mistral has expanded its Voxtral family with a text-to-speech system aimed at enterprise voice applications. The company is positioning the open-weights model as a flexible alternative for organizations that want more control over deployment, cost and customization.

UK Parliament opens workforce inquiry on Artificial Intelligence

A UK Parliament committee is examining how Artificial Intelligence is changing business and work, with a focus on both economic opportunity and labour disruption. The inquiry is seeking evidence on government priorities as adoption expands across the economy.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.