The Vanishing Web and the Promise of LLMs

2025-03-16

The internet is slowly forgetting: every year, a significant portion of web pages disappears forever. The Internet Archive (IA) stands as a crucial guardian of this digital memory, yet its survival faces increasing challenges. The author argues that while preserving everything is economically infeasible, the powerful information compression capabilities of Large Language Models (LLMs), despite inaccuracies, are better than complete loss. Models like DeepSeek V3 already offer a compressed view of the internet. We should support institutions like IA and ensure that publicly released LLM weights are not lost, and that IA is included in LLM pre-training datasets.