SearchGEO tests how LLM search agents endorse manipulated web content

Researchers introduced SearchGEO to measure how web content manipulation can influence LLM-based search agents. The evaluation found wide differences across backends, from full resistance in one Claude model to higher attack success in a Gemini model.

Researchers led by Yimeng Chen introduced SearchGEO, a controlled framework for testing endorsement corruption in LLM-based web-search agents. The work was submitted to arXiv on 15 Jun 2026 and focuses on whether manipulated web evidence can turn attacker-published pages into claims endorsed by an agent.

The evaluation covered 13 LLM backends on 308 cases each, with attack success rate varying sharply by model. Results ranged from 0.0% on Claude-Sonnet-4.6 to 31.4% on Gemini-3-Flash, showing that endorsement reliability can differ substantially across backends under adversarial search conditions.

SearchGEO combines a web-evidence manipulation pipeline, a five-mode attack taxonomy, multiple output-level metrics, and an auxiliary agent-skill probe that frames endorsement as an install command. The probe found a split between Claude systems that tended to “over-reject” and GPT systems that tended to “over-trust,” highlighting failure modes that may not appear in isolated model tests.

64

Impact Score

EU sets rules for trustworthy AI

The AI Act creates a risk-based framework for AI developers and deployers across the EU. The rules ban certain harmful uses, impose obligations on high-risk systems and introduce transparency duties for generative and general-purpose models.

Nvidia faces a more credible benchmark fight

Inference costs are pushing cloud buyers to compare GPUs, custom silicon and total operating economics more closely. Nvidia remains ahead, but AMD and hyperscaler chips are giving customers stronger alternatives.

Local models become practical for developers in 2026

Open-source models such as Gemma 4 and Qwen 3 have narrowed the gap with cloud systems for coding, tool use, and experimentation. Hardware, latency, context limits, and complex reasoning remain the main constraints.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.