Meta and OpenAI Accused of Using Pirated Database to Train AI Models

2025-03-22
Meta and OpenAI Accused of Using Pirated Database to Train AI Models

Meta and OpenAI are embroiled in a copyright controversy after it was revealed they used the pirated book database Library Genesis (LibGen) to train their AI models. To expedite the training of its Llama 3 model, Meta bypassed expensive licensing processes and directly downloaded millions of books and papers from LibGen. This action led to a lawsuit from authors, with court documents revealing Meta employees acknowledged the legal risks and attempted to cover their tracks. OpenAI also admitted to past use of LibGen, but claims its latest models no longer rely on this dataset. The incident highlights the ethical and legal challenges surrounding the sourcing of training data for AI models and the protection of intellectual property.