ChatGPT shows strong medical knowledge but still hallucinates key genetic data

New research finds that ChatGPT can accurately recognize many medical terms, drugs and genes, but struggles with informal symptom descriptions and fabricates genetic identifiers from key biomedical databases.

Growing numbers of people are asking generative Artificial Intelligence tools such as ChatGPT whether their symptoms indicate serious conditions like cancer or cardiac arrest, raising concerns about safety and accuracy. A new study published in the journal iScience evaluated how well ChatGPT and related large language models handle biomedical information, focusing on disease terms and three types of associations: drug names, genetics and symptoms. The work was led by Ahmed Abdeen Hamed, a research fellow at Binghamton University’s Thomas J. Watson College of Engineering and Applied Science, in collaboration with researchers from AGH University of Krakow in Poland, Howard University and the University of Vermont.

Hamed previously developed a machine-learning algorithm called xFakeSci that can detect up to 94% of bogus scientific papers, and he framed the new study as a step toward verifying biomedical generative capabilities of large language models. When tested, the Artificial Intelligence showed high accuracy in identifying disease terms (88-97%), drug names (90-91%) and genetic information (88-98%), far exceeding Hamed’s initial expectation that it would reach “at most 25% accuracy.” The system reliably labeled cancer and hypertension as diseases, fever as a symptom, Remdesivir as a drug and BRCA as a gene related to breast cancer, which the researchers described as an impressive outcome given the conversational design of the model.

Performance dropped significantly when the model was asked to identify symptoms, where it scored 49-61%. The researchers suggest this gap stems from how large language models are trained versus how medical knowledge is formally structured. Biomedical experts rely on ontologies to define and organize terms and relationships, while everyday users enter informal, social language when describing their health concerns. Hamed noted that ChatGPT uses friendly phrasing to communicate with average people and appears to simplify or “minimize the formalities of medical language” for symptoms in response to heavy user traffic. A more serious flaw emerged in tests involving genetic data from the National Institutes of Health’s GenBank database, which assigns accession numbers like NM_007294.4 for the Breast Cancer 1 gene (BRCA1). When prompted for these identifiers, the model simply made them up, a hallucination Hamed regards as a major failure that must be addressed. He argues that integrating biomedical ontologies directly into large language models could greatly improve accuracy, eliminate hallucinations and turn such tools into far more reliable resources, while his broader goal remains exposing flaws so data scientists can refine these systems and avoid building theories on suspect information.

55

Impact Score

Judge blocks Pentagon move against Anthropic

A federal judge temporarily blocked the Pentagon from labeling Anthropic a supply chain risk after finding major gaps between public threats, legal authority, and the government’s courtroom arguments. The dispute has become a test of how far the government can go in punishing an Artificial Intelligence company over political and contractual conflict.

Anumana wins FDA clearance for pulmonary hypertension ECG Artificial Intelligence tool

Anumana has received FDA 510(k) clearance for an Artificial Intelligence-enabled pulmonary hypertension algorithm designed for use with standard 12-lead electrocardiograms. The company says the software can help clinicians spot early signs of disease within existing workflows and without moving patient data outside the health system environment.

Anu Bradford on tech sovereignty and regulatory fragmentation

Anu Bradford argues that Europe is wavering in its role as the world’s digital rule-setter just as governments everywhere move toward more state control over technology. Global companies are being pushed to treat geopolitical risk, data sovereignty, and Artificial Intelligence governance as core strategic issues.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.