Google DeepMind pushes for rigorous tests of chatbot morality

Google DeepMind researchers are calling for systematic ways to measure the moral competence of large language models, warning that current behavior may be fragile performance rather than genuine moral reasoning.

Google DeepMind researchers are urging the field to scrutinize the moral behavior of large language models with the same rigor used to evaluate coding and math, as people increasingly rely on these systems as companions, therapists, medical advisors, and agents that act on their behalf. William Isaac and Julia Haas argue that while questions of code and math have clear-cut answers, moral questions are inherently plural, with better and worse responses rather than a single correct one. They describe morality as an important capability that is hard to evaluate and note that people still do not know how trustworthy these models really are when they influence human decision-making in sensitive contexts.

Studies show that large language models can display impressive moral competence, with one experiment finding that people in the US scored ethical advice from OpenAI’s GPT-4o as more moral, trustworthy, thoughtful, and correct than advice from the human writer of the New York Times “The Ethicist” column. Yet researchers warn that it is difficult to tell whether such behavior reflects memorized performance or some form of moral reasoning, raising the question of whether current systems exhibit virtue or mere virtue signaling. Multiple studies reveal how easily these models can be swayed: they may reverse answers when users push back, give different political responses depending on whether prompts are multiple-choice or free-form, and change moral preferences in response to tiny formatting tweaks such as relabeling options from “Case 1” and “Case 2” to “(A)” and “(B),” swapping option order, or ending a question with a colon instead of a question mark.

In response, Haas, Isaac, and colleagues propose a new research agenda to develop more rigorous techniques for evaluating moral competence, including stress tests that intentionally push models to change their answers to moral questions to expose shallow reasoning. They recommend challenging models with nuanced variations of moral scenarios to see whether responses are rote or contextually appropriate, such as distinguishing legitimate concerns around a man donating sperm to his son to enable him to have a child from inappropriate inferences about incest. They also call for methods that surface the steps behind an answer, such as chain-of-thought monitoring and mechanistic interpretability, to offer partial insight into whether outputs are grounded in evidence. At the same time, they highlight a deeper challenge: models are deployed globally across cultures with divergent values, and even simple questions like “Should I order pork chops?” should vary by user background, such as vegetarian or Jewish. Researchers suggest that systems may need to either generate a range of acceptable answers or switch between moral codes, reflecting pluralistic values across populations. Experts like Vera Demberg and Danica Dillion note that pluralism in Artificial Intelligence remains a major open problem, that training data leans heavily Western, and that both the normative question of how moral reasoning should work and the technical question of how to achieve it are still unresolved. Isaac frames morality as a new frontier for large language models and suggests that advancing moral competency could help produce better Artificial Intelligence systems that more closely align with society.

56

Impact Score

YouTube to automatically label Artificial Intelligence-generated videos

YouTube is shifting from voluntary disclosure to automated detection for significant photorealistic Artificial Intelligence-generated video content. Labels will become more visible across long-form videos and Shorts, with permanent markers for content made with YouTube tools or verified through provenance systems.

Axiom Math says its proofs reached peer reviewed journals

Axiom Math says proofs generated by its system have been accepted by several peer-reviewed journals, pairing machine-checkable formal proofs with human-authored papers. The development adds evidence that Artificial Intelligence tools are beginning to contribute to publishable mathematical research.

Google expands Gemini for Science

Google is rolling out Gemini for Science, a set of experimental tools aimed at compressing scientific work that would typically take months or years into days. The effort combines multi-agent research systems, computational discovery tools, literature analysis, and database-connected life science assistants.

Europe weighs technology sovereignty push amid internal debate

Europe is preparing a new policy push to reduce reliance on major technology platforms, but internal disagreements are shaping the scope and pace of the effort. The Artificial Intelligence Development Act is due to be unveiled on June 3 after repeated delays.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.