Google DeepMind pushes for rigorous tests of chatbot morality

Google DeepMind researchers are calling for systematic ways to measure the moral competence of large language models, warning that current behavior may be fragile performance rather than genuine moral reasoning.

Google DeepMind researchers are urging the field to scrutinize the moral behavior of large language models with the same rigor used to evaluate coding and math, as people increasingly rely on these systems as companions, therapists, medical advisors, and agents that act on their behalf. William Isaac and Julia Haas argue that while questions of code and math have clear-cut answers, moral questions are inherently plural, with better and worse responses rather than a single correct one. They describe morality as an important capability that is hard to evaluate and note that people still do not know how trustworthy these models really are when they influence human decision-making in sensitive contexts.

Studies show that large language models can display impressive moral competence, with one experiment finding that people in the US scored ethical advice from OpenAI’s GPT-4o as more moral, trustworthy, thoughtful, and correct than advice from the human writer of the New York Times “The Ethicist” column. Yet researchers warn that it is difficult to tell whether such behavior reflects memorized performance or some form of moral reasoning, raising the question of whether current systems exhibit virtue or mere virtue signaling. Multiple studies reveal how easily these models can be swayed: they may reverse answers when users push back, give different political responses depending on whether prompts are multiple-choice or free-form, and change moral preferences in response to tiny formatting tweaks such as relabeling options from “Case 1” and “Case 2” to “(A)” and “(B),” swapping option order, or ending a question with a colon instead of a question mark.

In response, Haas, Isaac, and colleagues propose a new research agenda to develop more rigorous techniques for evaluating moral competence, including stress tests that intentionally push models to change their answers to moral questions to expose shallow reasoning. They recommend challenging models with nuanced variations of moral scenarios to see whether responses are rote or contextually appropriate, such as distinguishing legitimate concerns around a man donating sperm to his son to enable him to have a child from inappropriate inferences about incest. They also call for methods that surface the steps behind an answer, such as chain-of-thought monitoring and mechanistic interpretability, to offer partial insight into whether outputs are grounded in evidence. At the same time, they highlight a deeper challenge: models are deployed globally across cultures with divergent values, and even simple questions like “Should I order pork chops?” should vary by user background, such as vegetarian or Jewish. Researchers suggest that systems may need to either generate a range of acceptable answers or switch between moral codes, reflecting pluralistic values across populations. Experts like Vera Demberg and Danica Dillion note that pluralism in Artificial Intelligence remains a major open problem, that training data leans heavily Western, and that both the normative question of how moral reasoning should work and the technical question of how to achieve it are still unresolved. Isaac frames morality as a new frontier for large language models and suggests that advancing moral competency could help produce better Artificial Intelligence systems that more closely align with society.

56

Impact Score

Google Vids opens free video generation to all Google users

Google has made Google Vids available to anyone with a Google account, adding free access to video generation with its latest models. The move expands Google’s end-to-end video workflow and increases pressure on rivals that charge for similar tools.

Court warns against chatbot legal advice in Heppner case

A federal court found that chats with a publicly available generative Artificial Intelligence tool were not protected by attorney-client privilege or the work-product doctrine. The ruling highlights litigation risks when executives or employees use chatbots for legal guidance without lawyer supervision.

Newsom orders California to weigh Artificial Intelligence harms in contract rules

Gov. Gavin Newsom has signed an executive order directing California agencies to account for potential Artificial Intelligence harms in state contracting while expanding approved use of generative tools across government. The move follows a dispute involving Anthropic and reflects a broader split between California and the Trump administration on Artificial Intelligence oversight.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.