Cloudflare has accused Perplexity AI of scraping websites that had already blocked the company’s bots from accessing their content. The infrastructure provider says it observed large-scale traffic from unidentified crawlers that eventually traced back to Perplexity.
These requests reportedly bypassed site-level restrictions that were meant to block data harvesting. Several website owners had added rules through standard methods, such as robots.txt files, that are commonly used to control how search engines or bots interact with site content. Despite those restrictions, Cloudflare claims that Perplexity’s crawler accessed the blocked material using methods designed to disguise its identity.

The issue came to light after site administrators reported continued scraping despite those measures. Cloudflare then set up controlled test domains. According to its findings, Perplexity’s crawler visited those pages even though the domains were not supposed to be indexed. The crawler reportedly used generic browser identifiers, rotated IP addresses, and impersonated legitimate browsers like Chrome.
Claims of deliberate evasion
The traffic came from multiple networks, some of which did not belong to Perplexity’s known infrastructure. The company’s name was not included in the browser headers used during many of those requests, which made it harder to trace. Cloudflare stated that the activity spanned millions of requests and targeted tens of thousands of domains each day.
Perplexity denied the accusations, saying that the logs referenced in Cloudflare’s report did not confirm content access. The company also claimed that some of the bots identified were not under its control.
Even so, Cloudflare removed Perplexity from its verified bot list. It also added new rules to help website operators block such activity more effectively. The company pointed out that while some AI platforms respect anti-scraping rules, Perplexity had repeatedly changed its methods to continue accessing restricted material.
Broader disputes over content access
This is not the first time Perplexity has drawn criticism over its data practices. Last year, the company was accused of using content from news publishers without permission. Around the same time, its CEO was unable to define how the company distinguished between fair use and plagiarism during a public interview.
Cloudflare has recently taken a stronger stance against AI data harvesting. It launched a tool that allows website owners to charge AI companies for data access and has also introduced free features aimed at blocking bots used for AI training. More than two million websites have now adopted similar measures.
In some cases, companies have taken legal action. Perplexity and other AI providers have been named in lawsuits from publishers who claim that their work was copied without a license. While firms like OpenAI have adjusted their access policies, Cloudflare says Perplexity continues to look for ways around these restrictions.
Cloudflare expects that the company may soon change its crawler once again to bypass the new blocks. For now, the dispute highlights how AI tools often rely on unrestricted access to content, even as more site owners try to limit or control how that data is used.
Read next: Apple’s AI Strategy Takes Shape Through New Search Tool, Infrastructure, and Hiring