Polish outperforms English and Chinese in long-context large language model tests

A new OneRuler benchmark presented at COLM 2025 finds Polish achieves the highest accuracy when large language models process very long documents. The study links the advantage to tokenization and script differences rather than dataset volume.

A multilingual benchmark introduced as OneRuler at COLM 2025 evaluated how large language models handle long documents and produced an unexpected ranking: Polish leads accuracy at extended context lengths. The paper tested 26 languages across retrieval and aggregation tasks and reports Polish achieving an average accuracy of 88% at long-context scales, defined around 64,000 tokens and beyond. English falls to sixth place on that scale, while Chinese ranks among the bottom four.

The authors argue the disparity is tied less to training data volume and more to tokenization efficiency and script characteristics. Languages using Latin-based scripts, such as Polish, French and Spanish, consistently outperformed languages that use logographic or abugida writing systems. The benchmark shows many languages with logographic or abugida scripts, including Chinese, Korean and Tamil, deliver only moderate accuracy even at shorter contexts and deteriorate further as sequence length increases. The measured performance gap between strongest and weakest languages widens sharply as context expands, moving from an 11 percent difference at 8,000 tokens to a 34 percent difference at 128,000 tokens. The study also highlights sensitivity to instruction phrasing: permitting a model to answer none when a target string is absent reduced English accuracy by 32 percent at 128k tokens.

The findings imply that long-context evaluation for Artificial Intelligence systems cannot rely solely on English benchmarks. While the OneRuler tests compared model families, the results suggest that generalizing performance across languages is misleading unless tokenization and script effects are accounted for. As context windows grow into the tens of thousands of tokens, structural language differences become more important than dataset dominance, and multilingual long-context benchmarks are necessary for representative evaluation.

55

Impact Score

Congress weighs Artificial Intelligence transparency rules

Bipartisan lawmakers are pushing a federal transparency standard for the largest Artificial Intelligence models as Congress works on a broader national framework. The proposal aims to increase public trust while avoiding stricter state-by-state requirements and heavier regulation.

Report finds California creative job losses are not driven by Artificial Intelligence

New research from Otis College of Art and Design finds California’s recent creative industry job losses stem from cost pressures and structural shifts, not direct worker displacement by generative Artificial Intelligence. The technology is changing workflows and expectations, but it is largely replacing tasks rather than entire jobs.

U.S. senators propose broader chip tool export ban for Chinese firms

A bipartisan proposal in the U.S. Senate would shift semiconductor equipment controls from specific fabs to targeted Chinese companies and their affiliates. The measure is aimed at cutting off access to advanced lithography and other wafer fabrication tools for firms such as Huawei, SMIC, YMTC, CXMT, and Hua Hong.

Trump executive order targets state Artificial Intelligence laws

Executive Order 14365 lays out a federal strategy to discourage, challenge, and potentially preempt state Artificial Intelligence laws viewed as burdensome. Employers are advised to keep complying with current state and local rules while preparing for regulatory uncertainty in 2026.

Who decides how America uses Artificial Intelligence in war

Stanford experts are divided over how the United States should govern Artificial Intelligence in defense, surveillance, and warfare. Their views converge on one point: decisions with such high stakes cannot be left to companies alone.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.