Nepenthes: A Web Crawler Tarpit
2025-01-16
Nepenthes is a tool designed to trap web crawlers, particularly those scraping data for LLMs. It generates an endless sequence of pages, each with dozens of links leading back into the tarpit. Pages are randomly generated deterministically, appearing as unchanging static files. Intentional delays prevent crawlers from bogging down your server and waste their time. Optional Markov babble can be added, giving crawlers data to hopefully accelerate model collapse. Warning: This consumes significant CPU, especially with the Markov module enabled. Use with caution.
Development
anti-crawler