ChatGPT shows strong medical knowledge but still hallucinates key genetic data

New research finds that ChatGPT can accurately recognize many medical terms, drugs and genes, but struggles with informal symptom descriptions and fabricates genetic identifiers from key biomedical databases.

Growing numbers of people are asking generative Artificial Intelligence tools such as ChatGPT whether their symptoms indicate serious conditions like cancer or cardiac arrest, raising concerns about safety and accuracy. A new study published in the journal iScience evaluated how well ChatGPT and related large language models handle biomedical information, focusing on disease terms and three types of associations: drug names, genetics and symptoms. The work was led by Ahmed Abdeen Hamed, a research fellow at Binghamton University’s Thomas J. Watson College of Engineering and Applied Science, in collaboration with researchers from AGH University of Krakow in Poland, Howard University and the University of Vermont.

Hamed previously developed a machine-learning algorithm called xFakeSci that can detect up to 94% of bogus scientific papers, and he framed the new study as a step toward verifying biomedical generative capabilities of large language models. When tested, the Artificial Intelligence showed high accuracy in identifying disease terms (88-97%), drug names (90-91%) and genetic information (88-98%), far exceeding Hamed’s initial expectation that it would reach “at most 25% accuracy.” The system reliably labeled cancer and hypertension as diseases, fever as a symptom, Remdesivir as a drug and BRCA as a gene related to breast cancer, which the researchers described as an impressive outcome given the conversational design of the model.

Performance dropped significantly when the model was asked to identify symptoms, where it scored 49-61%. The researchers suggest this gap stems from how large language models are trained versus how medical knowledge is formally structured. Biomedical experts rely on ontologies to define and organize terms and relationships, while everyday users enter informal, social language when describing their health concerns. Hamed noted that ChatGPT uses friendly phrasing to communicate with average people and appears to simplify or “minimize the formalities of medical language” for symptoms in response to heavy user traffic. A more serious flaw emerged in tests involving genetic data from the National Institutes of Health’s GenBank database, which assigns accession numbers like NM_007294.4 for the Breast Cancer 1 gene (BRCA1). When prompted for these identifiers, the model simply made them up, a hallucination Hamed regards as a major failure that must be addressed. He argues that integrating biomedical ontologies directly into large language models could greatly improve accuracy, eliminate hallucinations and turn such tools into far more reliable resources, while his broader goal remains exposing flaws so data scientists can refine these systems and avoid building theories on suspect information.

55

Impact Score

How to use artificial intelligence in content marketing

Content marketing teams are under pressure to ship more assets without ballooning costs, and artificial intelligence is emerging as a way to handle scale while humans stay focused on strategy and storytelling. A structured approach to brand voice, planning, and production helps organizations integrate artificial intelligence without sacrificing quality or authenticity.

How infinite synthetic content could reshape culture and society

Generative Artificial Intelligence is pushing media toward infinite, fluid, personalized, synthetic content, raising profound questions about social cohesion, truth, and mental health. Historical media theory suggests these shifts in form, not just content, will reshape how people think and how society organizes itself.

Artificial Intelligence music companies shaping the industry by 2026

Artificial Intelligence music startups are moving from the margins of the industry into core creative and licensing workflows, led by platforms like Suno, Udio, Klay Vision, and ElevenLabs. Their tools are redefining how songs are generated, rights are managed, and human producers collaborate with algorithms.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.