Robots.txt has been the internet voluntary content access agreement since 1994. For 30 years, it has worked because crawlers respected the rules. Now, Tollbit data shows that 13.26% of AI bot requests ignore robots.txt directives - up from 3.3% in Q4 2024.
The Numbers
- 13.26% of AI bot requests ignored robots.txt in Q2 2025
- 3.3% ignored robots.txt in Q4 2024
- That is a 4x increase in non-compliance in just two quarters
Why This Is Happening
- New AI browsers: Perplexity Comet, Firecrawl, and Browserless are "indistinguishable from humans in site logs" according to Tollbit
- Training demand: The demand for content creates incentives to access content regardless of restrictions
- Bot fragmentation: New AI companies may not implement robots.txt compliance from the start
What Still Works
The major AI companies (OpenAI, Anthropic, Google) publicly commit to respecting robots.txt. For businesses that need stronger protection:
- Server-level blocking (.htaccess): Block specific user agents at the web server level
- Firewall/CDN blocking: Cloudflare, Sucuri, Wordfence can block at the infrastructure level
- Rate limiting: Throttle suspicious crawl patterns
- HTTP headers: X-Robots-Tag: noai for additional signals
Legal Developments
The EU Copyright Directive now recognizes robots.txt as a valid machine-readable opt-out mechanism. This legal framework is still evolving, but the trend is toward stronger enforcement.
AppWT Approach
We implement multi-layered AI bot management: strategic robots.txt, server-level .htaccess blocking, Imunify360 security, and regular monitoring. The goal is maximum AI visibility for business discovery while protecting against unwanted scraping. Learn about our cybersecurity services.
Tags
Tony Paris
Founder and Tech Wizard at AppWT Web & AI Solutions. With over 29 years of experience in web development, Tony helps businesses succeed online through custom websites, SEO, and AI integration.
Learn more about TonyEnjoyed this article?
Share it with your network