Google DeepMind pushes for rigorous tests of chatbot morality

Google DeepMind researchers are calling for systematic ways to measure the moral competence of large language models, warning that current behavior may be fragile performance rather than genuine moral reasoning.

Google DeepMind researchers are urging the field to scrutinize the moral behavior of large language models with the same rigor used to evaluate coding and math, as people increasingly rely on these systems as companions, therapists, medical advisors, and agents that act on their behalf. William Isaac and Julia Haas argue that while questions of code and math have clear-cut answers, moral questions are inherently plural, with better and worse responses rather than a single correct one. They describe morality as an important capability that is hard to evaluate and note that people still do not know how trustworthy these models really are when they influence human decision-making in sensitive contexts.

Studies show that large language models can display impressive moral competence, with one experiment finding that people in the US scored ethical advice from OpenAI’s GPT-4o as more moral, trustworthy, thoughtful, and correct than advice from the human writer of the New York Times “The Ethicist” column. Yet researchers warn that it is difficult to tell whether such behavior reflects memorized performance or some form of moral reasoning, raising the question of whether current systems exhibit virtue or mere virtue signaling. Multiple studies reveal how easily these models can be swayed: they may reverse answers when users push back, give different political responses depending on whether prompts are multiple-choice or free-form, and change moral preferences in response to tiny formatting tweaks such as relabeling options from “Case 1” and “Case 2” to “(A)” and “(B),” swapping option order, or ending a question with a colon instead of a question mark.

In response, Haas, Isaac, and colleagues propose a new research agenda to develop more rigorous techniques for evaluating moral competence, including stress tests that intentionally push models to change their answers to moral questions to expose shallow reasoning. They recommend challenging models with nuanced variations of moral scenarios to see whether responses are rote or contextually appropriate, such as distinguishing legitimate concerns around a man donating sperm to his son to enable him to have a child from inappropriate inferences about incest. They also call for methods that surface the steps behind an answer, such as chain-of-thought monitoring and mechanistic interpretability, to offer partial insight into whether outputs are grounded in evidence. At the same time, they highlight a deeper challenge: models are deployed globally across cultures with divergent values, and even simple questions like “Should I order pork chops?” should vary by user background, such as vegetarian or Jewish. Researchers suggest that systems may need to either generate a range of acceptable answers or switch between moral codes, reflecting pluralistic values across populations. Experts like Vera Demberg and Danica Dillion note that pluralism in Artificial Intelligence remains a major open problem, that training data leans heavily Western, and that both the normative question of how moral reasoning should work and the technical question of how to achieve it are still unresolved. Isaac frames morality as a new frontier for large language models and suggests that advancing moral competency could help produce better Artificial Intelligence systems that more closely align with society.

56

Impact Score

The power struggles behind predictive technology

Three new books trace how modern predictive technologies, from supervised learning algorithms to rational choice theory, have turned forecasts into tools of power and control. Together they argue that democratic oversight, human judgment, and resistance to self-fulfilling prophecies are essential correctives to an increasingly automated future.

How external vendors really work and how artificial intelligence agents will change them

External service providers prioritize their own long term relationships and knowledge over any single client, and the rise of artificial intelligence agents will intensify that dynamic while automating much of their work. Leaders who learn how these vendors think and operate, and how they use artificial intelligence, will gain a lasting strategic edge.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.