Recent research has demonstrated that groups of large language models (LLMs), when made to interact, can spontaneously develop social norms akin to those observed in human communities. In a study published in Science Advances, researchers investigated how LLMs, such as Anthropic´s Claude and Meta´s Llama, behave when engaged in collective tasks, drawing parallels to classic studies of group dynamics among people.
In one experiment, the research team set up a ´naming game´ using 24 copies of the Claude model. Each copy was paired with another and tasked with selecting a letter from a small pool, earning virtual rewards for matching choices and penalties for mismatched selections. Over repeated rounds and changing partner assignments, the models began converging on the same choices, indicating the formation of a group-specific convention. This convergence reflects the emergence of a social norm, much like how human societies adopt shared rules for language or behavior through repeated interactions and incentives.
The experiment was expanded to include larger groups and a broader range of choices, with similar results, and was successfully replicated with multiple versions of Meta´s Llama model. While individual models showed randomness in their initial selections, group dynamics led to the development of collective biases, with preferences emerging for certain options independent of any explicit instructions. This phenomenon is significant, as it demonstrates that LLMs in group settings are capable of creating shared conventions and even collective biases, a trait previously undocumented in Artificial Intelligence systems. The study´s authors caution that such emergent collective biases could potentially lead to harmful outcomes when LLMs are deployed in collaborative applications, underscoring the need to test and address group-level behaviors in addition to individual model biases.