Meta's Llama 3.1 Model Found to Memorize Significant Portions of Copyrighted Books

2025-06-15
Meta's Llama 3.1 Model Found to Memorize Significant Portions of Copyrighted Books

New research reveals Meta's Llama 3.1 70B large language model surprisingly memorized substantial portions of copyrighted books, memorizing 42% of Harry Potter and the Sorcerer's Stone. This is significantly higher than its predecessor, Llama 1 65B, raising serious copyright concerns. Researchers efficiently assessed the model's 'memorization' by calculating the probability of generating specific text sequences, rather than generating a large volume of text. This finding could significantly impact copyright lawsuits against Meta and might prompt courts to revisit the boundaries of fair use in AI model training. While the model memorized less from obscure books, the excessive memorization of popular books highlights challenges in large language models concerning copyright issues.

AI