State media shapes large language model outputs

Research in Nature finds that government control of media can influence large language model behavior through training data. The effect appears especially visible across languages, with models producing more favorable answers about China when prompted in Chinese.

Millions of people around the world query large language models for information, and the study argues that the models are not shaped only by developers and product policies. Through six studies, the researchers find that government control of media already affects model outputs through the text used in pretraining. In a cross-national audit, large language models showed a stronger pro-government valence in the languages of countries with lower media freedom than in those with higher media freedom, suggesting that media environments leave detectable traces in model behavior.

To examine the mechanism more directly, the research focuses on China’s media system. The team reports that media scripted and curated by the Chinese state appears in large language model training datasets. They then used an open-weight model to test the effect of additional pretraining on Chinese state-coordinated media, finding that it generated more positive answers to prompts about Chinese political institutions and leaders. The paper presents this as evidence that state-controlled information can shape downstream responses rather than simply appearing alongside other training material.

The study also connects those findings to commercial systems through bilingual audits. Prompting models in Chinese generated more positive responses about China’s institutions and leaders than asking the same questions in English. A broader language-based test found that language-exclusive countries were rated more favorably in their own language when they had lower media freedom. The researchers argue that the combination of cross-language influence and the persuasive capacity of large language models points to a strategic risk: states and other powerful institutions may have stronger incentives to use media control in hopes of influencing future model outputs.

The paper positions the issue as a data governance problem as much as a model design problem. It emphasizes that political slant can enter systems through multilingual training corpora and persist into user-facing behavior, even when the models are not explicitly designed for political persuasion. The findings frame media freedom, dataset composition, and cross-lingual evaluation as central concerns for understanding how large language models represent governments, leaders, and public institutions.

78

Impact Score

Deepfake porn’s hidden victims

Nonconsensual sexual deepfakes are harming not only the people whose faces are inserted into explicit content, but also adult performers whose bodies and likenesses are repurposed without consent. As generative Artificial Intelligence tools spread, performers face growing psychological, legal, and financial risks with limited protection.

Deepfake porn and chatbot privacy breaches

Nonconsensual deepfake pornography is harming not only people whose faces are inserted into explicit media, but also adult creators whose bodies and likenesses are reused without permission. Generative Artificial Intelligence chatbots are also exposing private phone numbers, making personal information easier to retrieve and harder to control.

European Union Artificial Intelligence Act raises layered compliance demands for finance

Banks, insurers and financial intermediaries face a more complex compliance environment as the European Union Artificial Intelligence Act overlays existing financial regulation and the GDPR. Proposed changes in the Digital Omnibus Package may delay some obligations, but the core challenge remains managing overlapping rules, roles and regulators.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.