Study: artificial intelligence models struggle with basic human psychology

Researchers at Bielefeld University tested models including GPT-4 and CENTAUR and found that Artificial Intelligence systems fall short at distinguishing subtle human moral judgments.

Researchers at Bielefeld University led by Sarah Schröder ran experiments to evaluate whether modern Artificial Intelligence models can perform basic tasks in human psychology. The team compared human responses with outputs from models including GPT-4 and CENTAUR. The study reports that, despite large investments in development, Artificial Intelligence systems were unable to match human sensitivity to nuanced moral distinctions in the scenarios tested.

According to the article, humans in the experiments were able to discern subtle moral differences between scenarios that share a moral framework but differ in intent or context. By contrast, the tested Artificial Intelligence models tended to treat fundamentally different actions as equivalent within the same broad moral category. The piece notes the gap in moral reasoning capacity and emphasizes that these failures occurred even with widely used models such as GPT-4 and with models described as CENTAUR.

The authors highlight broader implications for the use of Artificial Intelligence in psychological research and other domains that depend on fine-grained judgment. The study raises concerns about the reliability of Artificial Intelligence when applied to tasks requiring empathy, contextual moral reasoning, or other forms of human insight. The article frames the findings as evidence that Artificial Intelligence cannot currently replace human participants or experts in certain types of psychological studies, noting that the shortfall persists despite substantial investment in the field.

The article also includes a note that an image accompanying the piece was generated by the author using GPT-5 Thinking. The report was published via Towards AI and lists MKWriteshere as the byline, with a last updated credit to the editorial team and a publication date of August 29, 2025. The authors and editors present the results as a call for caution when deploying Artificial Intelligence in research settings that rely on subtle moral or psychological judgments.

65

Impact Score

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.