The LLM Data Scraping Wars: A Copyright Battle and the Fightback

Popular：

Virtualization DNS security formal verification reachability analysis compiler errors macro conflict web extension development framework Bitmap Graphics API inconsistencies All Tags

The LLM Data Scraping Wars: A Copyright Battle and the Fightback

2025-09-14

The evolution of how large language models (LLMs) acquire training data has sparked intense copyright battles. Initially, data scraping lacked ethical and legal considerations. However, with the commercialization of apps like ChatGPT, copyright issues became increasingly prominent, leading authors and publishers to sue AI companies. Companies like OpenAI began making deals with publishers to access data, but data scraping continued unabated and even became more brazen. In response to this data abuse, Cloudflare and others introduced anti-scraping tools, and the RSL standard emerged, allowing websites to set prices for data access. This marks a proactive fightback by website owners, and AI companies may eventually be forced to pay for data, changing the data acquisition ecosystem.

(nymag.com)

Tech

Minimal Time-Sharing OS Kernel on RISC-V in Zig

JWST Hints at Earth-like Atmosphere on Exoplanet TRAPPIST-1e