SourceHut Slams AI Crawlers for Overwhelming Its Servers

2025-03-18
SourceHut Slams AI Crawlers for Overwhelming Its Servers

Open-source Git hosting service SourceHut is battling a wave of aggressive AI web crawlers that are overwhelming its servers. The company has deployed countermeasures, including a 'tar pit' called Nepenthes, and has blocked several cloud providers like Google Cloud and Azure due to excessive bot traffic. This isn't a new problem; SourceHut faced similar issues in 2022 with Google's Go Module Mirror, and other open-source projects have also been affected. While some AI companies have pledged to respect robots.txt, abuse persists, with sites like iFixit, Vercel, and Diaspora reporting issues. The situation is further complicated by sophisticated spoofing, with bots masquerading as legitimate crawlers like OpenAI's GPTBot. This makes log analysis difficult and highlights the growing challenge of managing AI crawler traffic. Ad metrics firm DoubleVerify reported an 86% increase in invalid traffic in the second half of 2024, with 16% attributed to AI scrapers.