ChatGPT shows strong medical knowledge but still hallucinates key genetic data

February 2, 2026

New research finds that ChatGPT can accurately recognize many medical terms, drugs and genes, but struggles with informal symptom descriptions and fabricates genetic identifiers from key biomedical databases.

Growing numbers of people are asking generative Artificial Intelligence tools such as ChatGPT whether their symptoms indicate serious conditions like cancer or cardiac arrest, raising concerns about safety and accuracy. A new study published in the journal iScience evaluated how well ChatGPT and related large language models handle biomedical information, focusing on disease terms and three types of associations: drug names, genetics and symptoms. The work was led by Ahmed Abdeen Hamed, a research fellow at Binghamton University’s Thomas J. Watson College of Engineering and Applied Science, in collaboration with researchers from AGH University of Krakow in Poland, Howard University and the University of Vermont.

Hamed previously developed a machine-learning algorithm called xFakeSci that can detect up to 94% of bogus scientific papers, and he framed the new study as a step toward verifying biomedical generative capabilities of large language models. When tested, the Artificial Intelligence showed high accuracy in identifying disease terms (88-97%), drug names (90-91%) and genetic information (88-98%), far exceeding Hamed’s initial expectation that it would reach “at most 25% accuracy.” The system reliably labeled cancer and hypertension as diseases, fever as a symptom, Remdesivir as a drug and BRCA as a gene related to breast cancer, which the researchers described as an impressive outcome given the conversational design of the model.

Performance dropped significantly when the model was asked to identify symptoms, where it scored 49-61%. The researchers suggest this gap stems from how large language models are trained versus how medical knowledge is formally structured. Biomedical experts rely on ontologies to define and organize terms and relationships, while everyday users enter informal, social language when describing their health concerns. Hamed noted that ChatGPT uses friendly phrasing to communicate with average people and appears to simplify or “minimize the formalities of medical language” for symptoms in response to heavy user traffic. A more serious flaw emerged in tests involving genetic data from the National Institutes of Health’s GenBank database, which assigns accession numbers like NM_007294.4 for the Breast Cancer 1 gene (BRCA1). When prompted for these identifiers, the model simply made them up, a hallucination Hamed regards as a major failure that must be addressed. He argues that integrating biomedical ontologies directly into large language models could greatly improve accuracy, eliminate hallucinations and turn such tools into far more reliable resources, while his broader goal remains exposing flaws so data scientists can refine these systems and avoid building theories on suspect information.

Source

55

Impact Score

Latest News

Judge blocks Pentagon move against Anthropic

March 31, 2026

A federal judge temporarily blocked the Pentagon from labeling Anthropic a supply chain risk after finding major gaps between public threats, legal authority, and the government’s courtroom arguments. The dispute has become a test of how far the government can go in punishing an Artificial Intelligence company over political and contractual conflict.

Health chatbots spread faster than independent testing

March 31, 2026

Microsoft, Amazon, OpenAI, and Anthropic are expanding consumer health chatbots as demand rises. Researchers say the tools may help fill care gaps, but independent evaluation still lags behind public rollout.

National Artificial Intelligence medical pilot base reports new healthcare milestones

March 31, 2026

China’s National Artificial Intelligence Application Pilot Base for the medical sector has unveiled a new batch of technical breakthroughs and smart healthcare applications. The program is positioning clinical validation, domestic computing, and hospital deployment as the backbone of broader public health adoption.

Anumana wins FDA clearance for pulmonary hypertension ECG Artificial Intelligence tool

March 30, 2026

Anumana has received FDA 510(k) clearance for an Artificial Intelligence-enabled pulmonary hypertension algorithm designed for use with standard 12-lead electrocardiograms. The company says the software can help clinicians spot early signs of disease within existing workflows and without moving patient data outside the health system environment.

Anu Bradford on tech sovereignty and regulatory fragmentation

March 30, 2026

Anu Bradford argues that Europe is wavering in its role as the world’s digital rule-setter just as governments everywhere move toward more state control over technology. Global companies are being pushed to treat geopolitical risk, data sovereignty, and Artificial Intelligence governance as core strategic issues.

ChatGPT shows strong medical knowledge but still hallucinates key genetic data

55

Impact Score

Latest News

Judge blocks Pentagon move against Anthropic

Health chatbots spread faster than independent testing

National Artificial Intelligence medical pilot base reports new healthcare milestones

Anumana wins FDA clearance for pulmonary hypertension ECG Artificial Intelligence tool

Anu Bradford on tech sovereignty and regulatory fragmentation

Contact Us