Study: artificial intelligence models struggle with basic human psychology

Researchers at Bielefeld University tested models including GPT-4 and CENTAUR and found that Artificial Intelligence systems fall short at distinguishing subtle human moral judgments.

Researchers at Bielefeld University led by Sarah Schröder ran experiments to evaluate whether modern Artificial Intelligence models can perform basic tasks in human psychology. The team compared human responses with outputs from models including GPT-4 and CENTAUR. The study reports that, despite large investments in development, Artificial Intelligence systems were unable to match human sensitivity to nuanced moral distinctions in the scenarios tested.

According to the article, humans in the experiments were able to discern subtle moral differences between scenarios that share a moral framework but differ in intent or context. By contrast, the tested Artificial Intelligence models tended to treat fundamentally different actions as equivalent within the same broad moral category. The piece notes the gap in moral reasoning capacity and emphasizes that these failures occurred even with widely used models such as GPT-4 and with models described as CENTAUR.

The authors highlight broader implications for the use of Artificial Intelligence in psychological research and other domains that depend on fine-grained judgment. The study raises concerns about the reliability of Artificial Intelligence when applied to tasks requiring empathy, contextual moral reasoning, or other forms of human insight. The article frames the findings as evidence that Artificial Intelligence cannot currently replace human participants or experts in certain types of psychological studies, noting that the shortfall persists despite substantial investment in the field.

The article also includes a note that an image accompanying the piece was generated by the author using GPT-5 Thinking. The report was published via Towards AI and lists MKWriteshere as the byline, with a last updated credit to the editorial team and a publication date of August 29, 2025. The authors and editors present the results as a call for caution when deploying Artificial Intelligence in research settings that rely on subtle moral or psychological judgments.

65

Impact Score

Nvidia launches nemotron 3 nano omni for enterprise agents

Nvidia has introduced Nemotron 3 Nano Omni, a multimodal open model designed to support enterprise agents that reason across vision, speech and language. The launch extends Nvidia’s push beyond hardware into models and services while targeting more efficient agentic workflows.

Intel 18A-P node improves performance and efficiency

Intel plans to present new results for its 18A-P process at the VLSI 2026 Symposium, highlighting gains in performance, power efficiency, and manufacturing predictability. The updated node is positioned as a stronger option for customers seeking 18A density with better operating characteristics.

EA CEO defends broader Artificial Intelligence use in game development

EA CEO Andrew Wilson defended the company’s internal use of Artificial Intelligence after employee claims that the tools were slowing work rather than helping. He framed the technology as an aid for repetitive quality assurance tasks, even as concerns persist over its broader impact on development.

Generative Artificial Intelligence is reshaping cybercrime less than feared

Research into criminal underground forums suggests generative Artificial Intelligence is being used mainly as a productivity tool rather than a transformative criminal breakthrough. The biggest near-term risks may come from automation, fraud support, and attackers adapting content to influence chatbot outputs.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.