Cloudflare announced it will now block artificial intelligence bots from crawling the websites it hosts, setting this restriction as the default behavior for its vast client base. This marks a significant shift in how content creators and publishers can control the exposure of their intellectual property to artificial intelligence-driven web scrapers. Historically, bots harvesting content were mainly associated with search indexing, but the rapid rise of artificial intelligence has shifted those motivations, often sidelining creators´ interests in favor of data accumulation for training large models.
Cloudflare´s new approach offers clients granular control, allowing them to manually permit or deny access to specific artificial intelligence crawlers and to manage crawling permissions for different development stages, such as training or inference. Additionally, the company is rolling out a pay-per-crawl service, letting clients set compensation rates for each crawl attempt by artificial intelligence bots. By introducing this system, Cloudflare aims to support fairer compensation models for online publishers, moving beyond the older system where content indexing would indirectly reward creators via web traffic.
Support from major media and community platforms, including Associated Press, Time, Quora, and Stack Overflow, underscores the demand for such controls and compensation. Stack Overflow´s CEO emphasized the need for community-driven platforms to reinvest in their contributors, given the pivotal role their content plays in training large language models. Cloudflare also noted its existing measures against bad-actor crawlers, including fake-page traps and sophisticated bot behavior detection, have informed its new strategy. While the move is widely welcomed, some experts warn that a default artificial intelligence crawler block could inadvertently hinder research and legitimate noncommercial uses, such as web archiving. Cloudflare contends that its verification and negotiation mechanisms will ensure openness and flexibility, allowing website owners to differentiate between commercial artificial intelligence models and other types of automated access while safeguarding content rights and sustainability.